Blog: David Loshin Subscribe to this blog's RSS feed!

David Loshin

Welcome to my BeyeNETWORK Blog. This is going to be the place for us to exchange thoughts, ideas and opinions on all aspects of the information quality and data integration world. I intend this to be a forum for discussing changes in the industry, as well as how external forces influence the way we treat our information asset. The value of the blog will be greatly enhanced by your participation! I intend to introduce controversial topics here, and I fully expect that reader input will "spice it up." Here we will share ideas, vendor and client updates, problems, questions and, most importantly, your reactions. So keep coming back each week to see what is new on our Blog!

About the author >

David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management: The Data Quality Approach and Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information. David can be reached at or at (301) 754-6350.

Editor's Note: More articles and resources are available in David's BeyeNETWORK Expert Channel. Be sure to visit today!

It is currently the holiday break, which means two things. First, almost everybody is taking time off, which means that (second) there is a little bit of breathing room for us to sit and ponder issues pushed into the background during the rest of the year. One of those items has to do with data quality scorecards, data issue severity, and setting levels of acceptability for data quality scores.

Essentially, if you can determine some assertion that describes your expectation for quality within one of the commonly used dimensions, then you are also likely to be able to define a rule that can validate data against that assertion. Simple example: the last name field of a customer record may not be null. This assertion can be tested on a record by record basis, or I can even extract the entire set of violations from a database using a SQL query.

Either way, I can get a score, perhaps either a raw count of violations, or a ratio of violating records to the total number of records; there are certainly other approaches to formulating a "score," but this simple example is good enough for our question: how do you define a level of acceptability for this score?

The approach I have been considering compares the relative financial impact associated with the occurrence of the error(s) against the various alternatives to address them. On one side of the spectrum, the data stewards can completely ignore the issue, allowing the organization to absorb the financial impacts. On the other side of the spectrum, the data stewards can invest in elaborate machinery to not only fix the current problem, but also ensure that it will never happen again. Other alternatives fall somewhere between these two ends, but where?

To answer this question, let's consider the economics. Ignoring the problem means that some financial impact will be incurred, but there is no cost of remediation. The other end of the spectrum may involve a signficant investment, but may address issues that occur sporadically, if at all, so the remediation cost is high but the value may be low.

So let's consider one question and see if that helps. At some point, the costs associated with ignoring a recurring issue equal the cost of preventing the impact in the first place (either by monitoring for an error or preventing it altogether). We can define that as the tolerance point - any more instances of that issue suddenly make prevention worth while. And this establishes one level of acceptability - the maximum number of errors that can be ignored.

Calculating this point requires two data points: the business impact cost per error, and the cost of prevention. The rest is elementary arithmetic - subtract the prevention cost from the business impact cost, and if you end up with a positive number, then it would have been worth preventing the errors.

My next pondering: how can you model this? More to follow...

Posted December 30, 2008 11:45 AM
Permalink | No Comments |

Leave a comment


Search this blog
Categories ›
Archives ›
Recent Entries ›