AIIM released the results of their recent survey into big data. The title of the report is striking: Big Data – Extracting Value from Digital Landfills.
“Landfill” is a provocative word. The hope that animates analytics is that by-product data can be mined for nuggets of information value, or used to fuel new insights. Contents sent to landfill are waste products that can't be reused or recycled. The implication of the report's title is that advocates of big data are trying to perform alchemy by turning dross into gold.
Digging deeper (if you will) the challenge for organizations seems to be in managing and exploiting unstructured data. Tools are emerging to conjure meaning from unstructured data but these are by no means mainstream at this time. My impression is that organizations are hoarding unstructured data – pictures, charts, audio, video – in hopes that one day they'll get around to turning it into valuable information assets.
However, they might be better advised to catalog their unstructured data at an explicit archiving point, rather than storing up the comprehension and indexing task for some future technology to accomplish.
One reason for cataloging now rather than reverse engineering later is that business information has a lifetime. Much of your unstructured data will be valueless by the time you make it usable. Another reason is that the earlier you use business intelligence, the faster you accrue benefits.
Have you ever seen a pile of old photos or photo albums at a junk sale? Who are all these people, frozen forever in black and white? Perhaps there's a book like this in your family. The people who took the pictures saw no need to provide the captions, because it didn't occur to them that the next generation wouldn't recognize all the uncles and aunts and cousins. A few minutes labeling would have preserved the meaning of the people in the pictures, but they've come down to us as anonymous folks.
So it is with an organization's unstructured data. Who is speaking in this video? What event are they speaking at? Where is the transcript of what they're saying – because, believe it or not, a delay of a couple of years in transcribing speech can be enough to make topical references or even industry jargon incomprehensible. The speed of change we all talk about is, among other things, a force for semantic decay.
Tools for analyzing unstructured data will undoubtedly be useful. But don't ignore the power your people have right now to turn unstructured data into usable business intelligence. It's cheap and fast to use their knowledge in live mode. And that way you're building an organized store of unstructured information.
Comments