Blog: Jill Dyché Subscribe to this blog's RSS feed!

Jill Dyché

There you are! What took you so long? This is my blog and it's about YOU.

Yes, you. Or at least it's about your company. Or people you work with in your company. Or people at other companies that are a lot like you. Or people at other companies that you'd rather not resemble at all. Or it's about your competitors and what they're doing, and whether you're doing it better. You get the idea. There's a swarm of swamis, shrinks, and gurus out there already, but I'm just a consultant who works with lots of clients, and the dirty little secret - shhh! - is my clients share a lot of the same challenges around data management, data governance, and data integration. Many of their stories are universal, and that's where you come in.

I'm hoping you'll pour a cup of tea (if this were another Web site, it would be a tumbler of single-malt, but never mind), open the blog, read a little bit and go, "Jeez, that sounds just like me." Or not. Either way, welcome on in. It really is all about you.

About the author >

Jill is a partner co-founder of Baseline Consulting, a technology and management consulting firm specializing in data integration and business analytics. Jill is the author of three acclaimed business books, the latest of which is Customer Data Integration: Reaching a Single Version of the Truth, co-authored with Evan Levy. Her blog, Inside the Biz, focuses on the business value of IT.

Editor's Note: More articles and resources are available in Jill's BeyeNETWORK Expert Channel. Be sure to visit today!


By Carol Newcomb, Senior
Consultant

Diamond in the Rough: Data
Quality

The third part of my summertime primer
addresses Data Quality Analysis.   Don’t even
start a data quality
analysis until you have completed the first two steps of your Root
Cause Analysis--investigate & prioritize any potential causative
factors, and start your metadata assessment.   Otherwise, you may be
misled by your findings.


Diamonds

Data quality is defined as complete and accurate data that is ready for business consumption.   Sources of poor data quality may include lack of data entry rules, unclear data element definitions, inconsistent metadata definitions for field type, format or intent, or breakdowns in data transformation processes as data flow between systems or applications.   Poor data quality results in bad business decisions; it contributes to major problems in using data effectively, and costs companies millions of dollars/year in terms of rework and inefficiency.   Data quality, in combination with robust metadata definitions, is part of the foundation of good data governance.

Data Quality Analysis

A Data Quality Management process should be designed to enable an area to start with a simple approach and over time to mature to one that is more proactive and comprehensive.   Initially, investigation may be focused on single data elements or events.   As patterns, data commonalities and other relationships appear, the data quality management process will grow to support complete business processes.     A mature data quality management process will not just resolve individual issues; it will also track relationships between data elements, ensure that business rules are consistent and generate statistical analyses to monitor previously addressed issues to ensure that data quality is stable and that an early warning system is in place as part of the data governance program.   The goal is to design a data quality management lifecycle, as shown in this diagram:


Carol_fig1

Initial Data Quality Analysis Process

I. Define data scope


    • Determine data elements that are associated with or are direct results of the reported issue

    • Check that all metadata definitions are present and current

    • Enlist the involvement of the Data SME or Data Stewards

    • Identify all source systems where the data originates, is   entered or derived



II. Extract and profile the data


    • Extract the relevant data from all key source systems.

    • Design the profile.   A profile will consist, at a minimum, of total record counts, min/max values, frequency of unique values, and frequency of invalid values (if defined) for each data element profiled.  

    • Profile the data to determine key characteristics that are contributing to the issue, such as:


      1. Wrong values

      2. Missing values

      3. Corrupt transformation processes

      4. Incorrect business rules

      5. Incorrect usage rules





III. Analyze Data Profile Results


    • Summarize the key findings from the profile detail

    • Determine what key drivers are contributing to the impact

    • Determine accountability for the data quality issue

    • Involve other Data Stewards in troubleshooting and designing the data quality solution



IV. Design the Corrective Action Plan

Two types of plans should be developed to address known data quality issues: a corrective action plan to fix the immediate source of the problem identified, and an ongoing monitoring plan, where thresholds have been determined and metrics are routinely collected and reported to data stakeholders.   This monitoring process should be scalable based on the number of data elements being tracked.



    1. Corrective Action Plan


      • Does scope of problem warrant change in metadata definitions, business practices or data entry rules?



      • Does scope of problem warrant a data governance standard?

      • Does the corrective action plan include details on how to fix the source of the problem as well as ways to correct historical data in the system?



    2. Preventive Action Plan


      • This plan will be designed to minimize the probability of data quality issues from recurring

      • Determine ‘early warning triggers’ based on designated thresholds.   These thresholds should reflect the business tolerance for inaccurate data (is 95% acceptable?)

      • If data latency is the source of a data quality issue, then latency thresholds should be included in the monitoring plan

      • Determine how frequently results of the monitoring plan will be reported to data stakeholders or governance oversight committees





Carol_fig2
So, now that summer is officially here, this wraps up my Data Governance Primer series.   Time for some iced tea and my favorite beach towel.   Come August, these little refreshers might be just the thing!

photo by Swamibu via Flickr (Creative Commons License)


CarolNewcomb_thumb Carol
Newcomb is a Senior Consultant with Baseline Consulting. She
specializes in developing BI and data governance programs to drive
competitive advantage and fact-based decision making. Carol has
consulted for a variety of health care organizations, including Rush
Health Associates, Kaiser Permanente, OSF Healthcare, the Blue Cross
Blue Shield Association and more. While working at the Joint Commission
and Northwestern Memorial Hospital, she designed and conducted
scientific research projects and contributed to statistical analyses.



Posted June 24, 2010 6:00 AM
Permalink | No Comments |