You may have heard the term “data lake”. You might even have one in your organization. If you do, it was most likely dug by technologists without any informed consent or encouragement from the business. Here's one way in which Gartner describe these constructions: “Data lakes focus on storing disparate data and ignore how or why data is used, governed, defined and secured.” Sounds more like a swamp, right?
Good intentions lie behind the data lake concept. Organizations suffer from information silos, so maybe it would help to put all the data together. Existing databases impose one kind of structure on data, which can make it difficult to exploit the data from another angle – and sometimes they discard or modify original aspects of the data which might be useful to someone else. So, pouring raw data into one place should eliminate these problems.
There are logic problems here. The first is that putting a bunch of silos together doesn't break down the silos. A data lake is a co-location of data silos, offering no business benefits beyond their original separation. Second, it's a fallacy to say that just because a data structure impeded novel types of analysis, then having no structure will enable novel types of analysis.
There's no side-stepping semantics. If you don't manage your data according to structures and relationships of meaning to the business, you will always struggle to collate data and derive value from it. Having your data in a metaphorical lake is no better than having it up a tree or buried in sand. The reality is, if you don't have structure, you don't really have data – you have noise.
I wonder what makes data lakes attractive? I guess there is hope that by gathering all data together in a pristine form, uncontaminated by line-of-business viewpoints, analysts will be able to apply smart algorithms that produce new insights. The dream is about going beyond reports and producing prognostications.
But here's a thing. There is no such thing as pristine, disinterested data. All data is a product of a conscious design and implementation process. The data we have comprises the answers to the questions we have posed. Data doesn't fall from the sky. It's not natural, and it's not neutral.
If technologists still aren't getting this, it's a sure sign the business is maturing faster than IT professionals. Business is about information. Information is structure. Data Lake Fallacy
Comments