By Carol Newcomb, Senior
Consultant
Diamond in the Rough: Data
Quality
The third part of my summertime primer
addresses Data Quality Analysis. Don’t even start a data quality
analysis until you have completed the first two steps of your Root
Cause Analysis--investigate & prioritize any potential causative
factors, and start your metadata assessment. Otherwise, you may be
misled by your findings.
Data Quality Analysis
A Data Quality Management process should be designed to enable an area to start with a simple approach and over time to mature to one that is more proactive and comprehensive. Initially, investigation may be focused on single data elements or events. As patterns, data commonalities and other relationships appear, the data quality management process will grow to support complete business processes. A mature data quality management process will not just resolve individual issues; it will also track relationships between data elements, ensure that business rules are consistent and generate statistical analyses to monitor previously addressed issues to ensure that data quality is stable and that an early warning system is in place as part of the data governance program. The goal is to design a data quality management lifecycle, as shown in this diagram:
Initial Data Quality Analysis Process
I. Define data scope- Determine data elements that are associated with or are direct results of the reported issue
- Check that all metadata definitions are present and current
- Enlist the involvement of the Data SME or Data Stewards
- Identify all source systems where the data originates, is entered or derived
II. Extract and profile the data
- Extract the relevant data from all key source systems.
- Design the profile. A profile will consist, at a minimum, of total record counts, min/max values, frequency of unique values, and frequency of invalid values (if defined) for each data element profiled.
- Profile the data to determine key characteristics that are contributing to the issue, such as:
- Wrong values
- Missing values
- Corrupt transformation processes
- Incorrect business rules
- Incorrect usage rules
III. Analyze Data Profile Results
- Summarize the key findings from the profile detail
- Determine what key drivers are contributing to the impact
- Determine accountability for the data quality issue
- Involve other Data Stewards in troubleshooting and designing the data quality solution
IV. Design the Corrective Action Plan
Two types of plans should be developed to address known data quality issues: a corrective action plan to fix the immediate source of the problem identified, and an ongoing monitoring plan, where thresholds have been determined and metrics are routinely collected and reported to data stakeholders. This monitoring process should be scalable based on the number of data elements being tracked.
- Corrective Action Plan
- Does scope of problem warrant change in metadata definitions, business practices or data entry rules?
- Does scope of problem warrant a data governance standard?
- Does the corrective action plan include details on how to fix the source of the problem as well as ways to correct historical data in the system?
- Preventive Action Plan
- This plan will be designed to minimize the probability of data quality issues from recurring
- Determine ‘early warning triggers’ based on designated thresholds. These thresholds should reflect the business tolerance for inaccurate data (is 95% acceptable?)
- If data latency is the source of a data quality issue, then latency thresholds should be included in the monitoring plan
- Determine how frequently results of the monitoring plan will be reported to data stakeholders or governance oversight committees
So, now that summer is officially here, this wraps up my Data Governance Primer series. Time for some iced tea and my favorite beach towel. Come August, these little refreshers might be just the thing!
photo by Swamibu via Flickr (Creative Commons License)
Newcomb is a Senior Consultant with Baseline Consulting. She
specializes in developing BI and data governance programs to drive
competitive advantage and fact-based decision making. Carol has
consulted for a variety of health care organizations, including Rush
Health Associates, Kaiser Permanente, OSF Healthcare, the Blue Cross
Blue Shield Association and more. While working at the Joint Commission
and Northwestern Memorial Hospital, she designed and conducted
scientific research projects and contributed to statistical analyses.
Posted June 24, 2010 6:00 AM
Permalink | No Comments |




