Sean Martin pens a great piece on data lakes, explaining what they are, their advantages and disadvantages, and how he sees the field developing over the next year. It's useful orientation for anyone who is negotiating the growth of big data in their organization or responding to increased demands for fast access to raw data.
Martin is clear that too many data lakes are being populated with data that's effectively unusable because it lacks context. Some organizations have traded the slow but structuring bottleneck of the data architecture team and the DBAs for rapid availability of inconsistent, untrusted, and even unintelligible, data. It's cheaper to hold all this stuff than it was under the old data warehouse regime. But if you can't exploit it, what have you gained?
The answer to the problem is semantic modeling, which adds context and meaning back into large data contexts without falling back to older technologies. Although Martin does not cover semantic modeling and ontologies in depth in this piece, I don't think he means to imply reinjecting these structural forces is trivial. You still need smart people to do this. It's a matter of using established skills in a new technical environment.
If you wanted to go deep on this issue, you could say it's an example of the age-old struggle between order and chaos – or maybe, more accurately, between the privileging of foresight over insight. We used to value centralized organizations which planned ahead and imposed order. Now business moves faster than ever before, so we value organizations that are distributed and empowered to make their own futures – using their creativity.
The reality is, you've got to have both. It's hard, because they're forces that naturally want to oppose each other. Harnessing and directing both impulses in the service of the organization's goals is right at the heart of contemporary leadership.
Comments