Ian Kalin at Socrata writes an excellent piece on open data, pinpointing and answering five key questions city leaders and CIOs raise about open data. Kalin's second question is so cool I want to quote it in full: “Hasn’t Big Data already solved the problem of messy data? - Tools exist to help mobile applications understand the similarities between a data field labeled 'Address' and one labeled 'Street, City, State'. The real problem is the scale of deployment. For example, a San Francisco-based application that places food inspection scores in people’s hands cannot easily scale to Philadelphia if scores are calculated differently in each city, or if the data isn’t available in real time via an Application Programming Interface (API).”
I'd be grateful for Kalin's articulation of the question, let alone the answer. This is an important, and often unspoken, misconception about today's data landscape. And the answer makes it clear that technology speed will never compensate for conceptual clarity. Even if you could always assemble the required data on the fly from disparate sources, you will always need to make that data available in a comprehensible, stable, and published format. Data standards are an inherent aspect of data exploitation.
Comments