How much does poor data quality cost business? There's no doubt that acting on information that is wrong or missing can lead to process failure or revenue loss. Events of this type tend to be the ones that draw attention to data quality issues. However, there are types of failures and losses stemming from lack of action based on faulty or absent information that are less noticeable. These types of event are only recognized with hindsight. But while it may be easy to categorize types of data quality costs, it's much harder to quantify them.
David Loshin, author of a 2010 book on data quality, has a blog post that rounds up various estimates of data quality costs. He finds quoted figures ranging from 8% to 35% of operating revenue. These figures span the time period 1998 to 2009. Loshin tentatively concludes that the perceived costs of poor data quality are rising, but he also calls for more hard research to be published so that those of us who care about business information can get a better picture.
The impact of poor data quality is ultimately, like everything else, measurable. If it wasn't measurable, then it wouldn't matter. I suspect the official reason why companies don't target this metric is the difficulty of separating out the contribution of data quality from other factors in a failure or loss. I'm pretty sure the unofficial reason is that people find it hard enough to nurture a quality culture around visible and tangible quantities; asking them to consider the impact of invisible, abstract entities seems like going too far.
I think we can tackle the first objection to measuring data quality by insisting that three questions are embedded in every business decision point:
1: What items of information are used in making this decision?
2: What is the acceptable tolerance for error in each item?
3: What is the net acceptable tolerance for error in this collection of items?
Question 1 may seem obvious, and it's the basic stuff of systems development. However, not all the information consumed in a decision process is captured in systems designs. These is increasingly the case as data from emails or web sites are used.
Question 2 is harder to answer. Answering it implies providing a business value for business data. The IT profession has long talked about data as an asset. This question takes that rhetoric seriously.
And Question 3 is what I guess we can call – taking inflation into account and considering the interconnectedness of today's businesses – the sixty-four trillion dollar question.
Certain combinations of data items have net values different than simple addition. Some data collections have the effect of reducing risk. In insurance, for example, the more you know about an insurable item, the more accurate your rating will be. On the other hand, some data collections generate additional uncertainty. This is particularly true of speculative data sets, such as customer behavior models. I know from experience that getting a solid answer to Question 1 in the most prominent parts of an enterprise is pretty hard.
Applying Question 1 right across the businesses is even harder – and harder to justify, even though it's often the peripheral parts of a business that trigger catastrophic failure through incidents of poor data governance. Yet rolling out Question 1 is achievable. We know how to do it. I suggest that we don't really know how to even pose Questions 2 and 3, let alone answer them.
I say “tolerance for error”, but I don't know what units that tolerance is measured in. Is it a unit derived from the data type itself? Is it a standard corporately defined unit? Whatever the units are, they have to be convertible into cash. I don't think we see enough discussion of these issues.
The good news is that we have IDMA (Insurance Data Management Association) (IDMA Home Page), the organization for data management professionals. Members of IDMA are dedicated to addressing data governanace and management issues. We need more atttention paid to data. Take a look at their website, get involved. Also see the blog from David Loshin