Supposedly, the technology world is buzzing about “big data”. When you say “big data”, you have to imagine it's capitalized. So, when I first heard it, I thought it was like Big Oil or Big Tobacco. Gee, I thought: so there's a big conspiracy of organizations controlling all the world's data. Great!
But no. Big data just means – lots of data. The idea is that there's now so much data sloshing and zipping around the world that... And that's where I lose the thread. I think what we're being pitched here is a revived focus on analytics. You've got lots of data: you need to get better value from it.
The Wikipedia entry for “Big Data” - as good a place to look as any, in this case – says that big data is datasets that are so large that they are “awkward” to use with current technology.
If this is so, then I guess for many years the threshold of big data was 640k. But to be fair, the sense of “awkward” being used here is to do with databases, and there may well be practical limits to the abilities of relational databases to handle large datasets inside of acceptable processing times.
I'd like to go against the grain and stoke up some complacency on this subject. I say: Don't worry about big data. A large proportion of your data is junk. If you use sound information modelling and management techniques, you can reduce your data holdings considerably. If the datasets you're left with are too big to analyse, then start to look for technologies that can handle them – but only if you can make a case for the investment.
The one and only law of computing that everyone agrees has been completely reliable is that processing power increases while its cost falls. I'd suggest that the continuing improvement in data storage capacity per dollar and increasing network bandwidth are also pretty reliable laws. Given these facts, I'd expect today's big data to be tomorrow's thumbnail drive. Wikipedia
Comments