Data Quality: Cost or Profit Center?

Originally published 17 March 2010

Data quality has typically been grunt work. It’s unsexy because errors are seen as “problems,” and people who (repeatedly) raise problems are generally not liked very much. This dynamic does not work very well, regardless of whether you are delivering or receiving bad news.

Data quality errors are rarely caused by purely technical malfunctioning. More often than not, they are a sign of some broken business process. Factual information that pinpoints when and how often your business processes break down is extremely useful. In fact, it is almost necessary to make improvements. Broken business processes are the root cause for the majority of data quality errors. And this is, coincidentally, also the kind of factual information that business executives love to get from data analysts. It completely changes the dynamic from raising “problems” to pointing towards areas for “improvement.”

All this demonstrates that there is tremendous value in capturing “bad” data. And once you turn bad data into better business practices, you can monetize what this change is worth to the business. By constantly aligning with primary processes within the business, you can turn data stewardship from a cost into a profit center.

A Need for Data Warehousing

Traditionally in data warehousing, we have taken source data and applied extract, transform, load (ETL) operations to clean, scrub and move data to our data warehouse (DWH) star schemas. We don’t want bad data going into the DWH, because that means it would show up in corporate reports. This might trigger business users to question the accuracy of the DWH which is supposed to be the trusted source for integrated corporate data.

When you derive numbers from disparate, unconnected source systems, the numbers probably won’t align. As businesses grow more complex and ever more digitized, we’ve seen an overwhelming proliferation of data streams. This becomes too much to handle, even for diehard spreadsheet warriors. Consistent information becomes commensurately more difficult to produce.

This dynamic has driven the quest for our holy grail: one single version of the truth. Either front-end systems are connected through enterprise application integration (EAI) and enterprise information integration (EII) solutions or the data is integrated on the back end or both.

In many organizations, however, confusion reigns. Spreadmarts and isolated databases are all too frequently the norm. When you ask what the “single version of the truth” means, you are likely to get as many different answers as the number of people you asked. That doesn’t sound very singular.

A Single Version of the Truth

In all fairness, “the truth” isn’t always obvious, not even when it’s staring you in the face. For many years, the hole in the ozone layer had been there, and data on it was recorded. However, data that was collected seemed so out of line with normal expectations that these measurements were habitually discarded (as errors). Sometimes the blatantly obvious can remain invisible, simply because we don’t want to see it.

This ozone example illustrates another point: what we consider to be “the truth” is an amalgam of facts and their interpretation. And the same holds for data warehousing. We record facts, but our interpretation of the truth only arises after we apply our model of “the world.” In data warehousing we force fit fundamentally irreconcilable data streams (filtering out errors), to arrive at the truth. We should never forget that the business rules we apply to change facts into reality can always be subject to change as our model evolves of what is real and what is not.

A DWH records facts, and as such can become the central (historical) repository. Our interpretation may well change over time. So we could choose to “rewrite” history as our interpretation of past facts changes. And it does, just like in the example of facts about the hole in the ozone layer that turned out to exist much earlier than we thought. This is what I mean when I say that a DWH is the single source of the facts and, thus, not of the truth.

What is the Single Version of the Truth?

If you think about it, truth really seems more of a philosophical or religious concept. Even mathematics has no truth but instead axiomatic derivations from prior statements. Empirical sciences like physics have no truth, either - merely hypotheses that last until someone proves them wrong.

For business intelligence (BI) professionals, a single version of the truth seems to equate to multiple colleagues all agreeing on a common interpretation of facts as recorded in a (central) database. When no one feels the urge anymore to corroborate database facts by triangulating with real-world observations (or other database systems), we buy into the truth. One could say that a common version of the truth equates to sharing the same point of view and “speaking the same language.”

Where does “Bad Data” Originate?

What is bad data, really? We usually call data “bad” when it doesn’t conform to management’s expectation of what transpired. A client is recorded as “M” when she looks evidently feminine. Or a database field is missing when a real-world value exists, etc. In short, bad data occurs when the content of the database does not accurately reflect the real-world situation.

There are fundamentally two mechanisms that lead to inaccurate data. First, the recording may have been correct at the time it was captured, but has changed in the meantime. Or secondly, the recording was wrong from the outset. As soon as you capture information, it starts to decay. Some attributes, like gender, tend to change (very) infrequently (at least in the outside world). Others, like employment status, change more often. Although these mechanisms differ, in both cases some business process is defective. Let’s look at an example.

Suppose a credit card customer provides comprehensive information when applying for a card. Later she wants a limit raise. Her income statement could still be accurate or might have changed. Obviously, if a limit raise isn’t backed up by sufficient income because she has lost her job that would be undesirable for all involved. By granting additional credit, the card issuer could be making a costly misclassification error. 

Broken Business Processes

What do we mean when we say broken business process? A broken business process is a drop or breakdown in the value creating chain of your primary business process. So, for example, if you are a manufacturer, your value chain is creating products from raw materials. When you are in the hospitality business you sell holidays, travel plans or accommodations, which are “assembled” from components. A retail company manages inventory to avoid stock-outs as well as inventory write-offs.

In the case of a card issuer, you sell cash management capability and, in return, you ask for interest (and some interchange fees). The credit you extend needs to match payment capabilities of cardholders. There will always be a small and manageable default rate, but lending decisions should be based on policy rules with regards to discretionary income of applicants and credit scoring (based on past lending and payment behavior). If not, your value creating process breaks down, because your credit portfolio will deteriorate. Rising default losses hit your bottom-line exponentially.

Granting credit without sufficient income from the applicant is a breakdown of your process. Pinpointing when (how often) and where this occurs is valuable business information. If you can surface such misclassification errors and then fix them, value creation is propelled forward. By the same token, the overwhelming majority of data quality errors, point to failures in business processes and rarely breakdown in technology (e.g., erroneous ETL programming, etc.).

Driving out the drops in your value creating chain is the fastest and most effective way to generate shareholder value. As such, data quality errors, at the root, are the “nuggets of gold” that point to improvement in your primary process capabilities. This turns typical data quality improvement efforts on their head: moving them from a cost center to a value creating opportunities.

Instead of dealing with data quality issues as problems, an alternative perspective is possible, too. Since data quality issues might be caused by technical hiccups but is usually the result of broken business processes, they are as much an opportunity as they are a problem. A broken business process is a disruption in value creation. You can use data quality as the royal road to value creation. Every time you analyze patterns in data quality problems and pinpoint the root causes where drops in value creation occur, you provide invaluable input to management.

When you make this translation from data quality as a “source of problems” to an “opportunity for improved value creation,” the entire dynamic turns around. Perennial DWH issues around creating a single version of the truth take on new meaning. Whenever departments disagree on the truth, a breakdown occurs in creating value that needs to be restored. And typically, BI experts are excellently placed to provide value in that discussion. If, for example, sales and logistics disagree on the “true” inventory, value is lost. You either have stock sitting that nobody is trying to sell (because the sales department is unaware of its existence) or you face stock-outs: you have an interested customer, but you can’t deliver product. Similar cases hold for all non-technical causes of data quality problems, and our assessment is that holds for 80% or more of cases.

You can shed new light on disparities in your corporate information environment and turn yesterday’s problems into tomorrow’s opportunities. A single version of the truth may still be what you are pursuing, which is noble. But instead of pouring money into fixing problems, the enlightened perspective is that patterns in data quality issues, when presented in conjunction with insight into value creation, are a tremendously valuable source of insight to drive the business forward.

  • Tom BreurTom Breur
    Tom Breur, Principal with XLNT Consulting, has a background in database management and market research. For the past 10 years, he has specialized in how companies can make better use of their data. He is an accomplished teacher at universities, MBA programs and for the Certified Business Intelligence Professional (CBIP) program. He is a regular keynoter at international conferences.  Currently,he is a member of the editorial board of the Journal of Targeting, the Journal of Financial Services Management and Banking Review. He acts as an advisor for The Council of Financial Competition and the Business Banking Board and was cited among others in Harvard Management Update about state-of-the-art data analytics. His company, XLNT Consulting, helps companies align their IT resources with corporate strategy, or in plain English, he helps companies make more money with their data. For more information you can email him at tombreur@xlntconsulting.com or call +31646346875.

     

Recent articles by Tom Breur

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!