Back in the day, data was a topic for geeks
– and these were not even alpha geeks (sorry guys). Now data is everybody's
concern, because data about everybody is getting everywhere.
The British commentator Simon Jenkins
recently devoted a column to the dangers of government data getting into the
wrong hands. As well as likening the situation to the early days of atomic
power, he said: “Data protection is a blazing contradiction in terms.” Now, data protection is not a contradiction
in terms, let alone a blazing one. Jenkins probably means he doesn't trust
claims made by technologists that data protection is possible.
Though I think his tone is a little
hysterical, it's true that not enough has been done to guarantee the anonymity
of personal data released in the public domain. And yet this issue has been
flagged up by government itself. The so-called mosaic effect is when you
combine two or more data sources to reveal a pattern that doesn't exist in any
single source.
It was the Health and Human Services Department that discovered
this, when it combined data about Medicare payments with Census data. This inadvertently led to some health
information about people in low population areas being potentially
identifiable.
The “mosaic” terminology comes from
database marketing, the field that discovered long ago how combining different
data sets could lead to increasingly accurate predictions about people's income
or propensity to buy certain products and services.
You could argue that data standards only
help people to create mosaic views from multiple data sources. You might
conclude that the more disparity we have in public data sets, the safer the
public will be.
I don't think that's right. We actually
need data standards in order to map potential mosaics before the hackers get to
them. Then we can restrict the relevant data sets accordingly. Standards will
let us see actual business relationships – such as the ones revealed by the
Medicare/Census mashup – ahead of data release. In this way, standards can help
us safeguard privacy. Digital Revolutiong Open Data Policy