BeyeNETWORK Belgium Blogs BeyeNETWORK Belgium Blogs. Copyright BeyeNETWORK 2005 - 2010 http://www.beyenetwork.be/rss/content.php 150 31 BeyeNETWORK Belgium Blogs http://www.b-eye-network.com/images/logo_b-eye_rss.gif http://www.beyenetwork.be/rss/content.php The Data Quality eBook: When Bad Data Happens to Good Companies.
DQeBook-400

You may have noticed we've slowed down our "In The Field" blog entries, but it's for a good reason. Last week, Baseline launched its latest e-book, co-authored by Baseline consultants and frequent bloggers, Carol Newcomb and Caryn Maresic.

The Data Quality eBook is both a cautionary tale and a nuts-and-bolts toolkit for bringing a set of formalized data quality processes to your company. When the Central Health Alliance discovers just how costly bad data can be, the health care provider launches a data quality program that not only improves services—it can actually save lives. This e-book looks at data issues faced by companies across industries, and shows you how to apply a step-by-step process to prevent over-investment in untrustworthy data and drive business value in the bargain.

The book is currently available for download at Information-management.com.

And now, a brief excerpt from the book:


DQImprovementProcess-400

At Central Health Alliance—as with many companies—protracted explanations and guesswork cede to manual effort.   If there is a problem hidden in the data, an analyst will surely find it. The question is: how long will that take?

The problem with manual data exploration is that you've got a lot of data—probably a lot more than you know.   Data is captured, copied and transformed—it is everywhere in all shapes and forms.   When digging through the data, where do you start?   More importantly, where do you stop?   Unfocused and manual data   profiling might lead to interesting discoveries, but won't get you a cohesive roadmap to better data quality. Moreover, it's hardly scalable.

The right way to improve data quality is by focusing on four incremental steps:

Identify the Business Issue – Defining the business issue and its impact on business operations, strategic goals, or decision making maintains focus for the remainder of this process. The scope of the business issue should be well understood. You might identify several related business issues that have bad data as their core. Or you might have a number of overarching issues, as Central Health Alliance does.

Assess Conformance to Requirements – After your business issue is well understood, it is time to do a data quality assessment.   The assessment is a focused effort to determine where in the data lifecycle things go wrong.   Central Health Alliance knows its business issues and they are poised to kick off the data assessment.

Discover the Root Causes – After you've assessed your data quality issues, it is time to discover why these problems are occurring. What are the root causes? Is there a lack of consistent training for the people who key in data?   Is there some buggy code that is moving data around behind the scenes?   Maybe there is some confusion about what the data actually means?

Formalize Improvements – Once you know the ”what” and the ”why,” it is time for action.   Improving data quality is often a two-pronged effort—you've got to fix what's wrong and you've got to put a monitoring system in place so that you will know when something goes awry in the future. By fixing the data problem at its source, you can not only prevent it from recurring, you can improve the quality of the data in upstream systems as well.

What are you waiting for? Go download the entire e-book today!



]]>
http://www.beyenetwork.be/blogs/dyche/archives/2010/08/the_data_qualit.php Thu, 26 Aug 2010 06:00:00 MST http://www.beyenetwork.be/blogs/dyche/archives/2010/08/the_data_qualit.php
Data Integration, 80%, and Webinar This Thursday They say that data integration accounts for 80% of the effort of a data warehousing (or a variety of other enterprise application's) effort. But who are "they"? I know that the figure is often presented as the typical resource and time investment for data integration activities, but have not tracked down a source for it. I seem to recall seeing it in some data warehousing book, but do not remember which one.

Nonetheless, there is no reason for data integration to consume that amount of effort if the right steps are taken ahead of time to reduce the comfusion and complexity of ambiguous semantics and structure. I will discuss these issues at a webinar this Thursday, August 12 - hope you can make it!



]]>
http://www.beyenetwork.be/blogs/loshin/archives/2010/08/data_integratio.php Tue, 10 Aug 2010 06:05:38 MST http://www.beyenetwork.be/blogs/loshin/archives/2010/08/data_integratio.php
Data Governance: "In Action" or "Inaction"? In the past week I have been sent email from two different organizations offering me information about data governance, and both cases seem to indicate apparent minimal dog food self-ingestion.

The first example is actually the last few of a string of emails that I have received over a nine-month period, each of which is addressed to "Jack." In the past month I have gotten six emails about a webinar on data governance. I responded to the sender three times. The first time I asked whether data quality was part of their talk on data governance, perhaps a tongue-in-cheek way of hoping that they'd notice that my email name ('David Loshin") and their salutation name did not match. No response from them. When I got the next one addressed to Jack, I emailed back saying that may name was not Jack. No response. The last email I got from them was responded to with a simpler question: Does anyone actually respond to emails sent to that email address? Apparently not. They don't know Jack ;-).

The second example is perhaps funnier. The salutation on the email I received regarding a new white paper including material on "the inability to use information for strategic business advantage" and recognizing data as an asset to "improve customer experiences" was "{FIRST_NAME}," which is perhaps a little more correct (I do indeed have a first name even if I don't typically use it) although equally indicative of an absence of oversight on the process of producing the information end-product (i.e., the emails).



]]>
http://www.beyenetwork.be/blogs/loshin/archives/2010/07/data_governance_1.php Wed, 28 Jul 2010 11:00:39 MST http://www.beyenetwork.be/blogs/loshin/archives/2010/07/data_governance_1.php
Thoughts on Parsing and Standardization, and Upcoming Webinar Last week I had an interesting discussion regarding technical aspects of data cleansing, particularly in the context of acquired data. The challenge posed was that the organization needed to collect data sets from numerous sources with no ability to introduce any types of data controls or dat avalidations. In other words, the data they got was what it was, and if they wanted to use it, they'd have to clean it up themselves.

So the discussion led to talk about tools for cleansing, and I mentioned that most products today provide some means of parsing and standardization as aprelude to entity resolution, matching, and consolidation. In fact, I will be continuing this discussion at a web seminar next week on Parsing and Standardization, and I hope you can attend!



]]>
http://www.beyenetwork.be/blogs/loshin/archives/2010/07/thoughts_on_par.php Thu, 22 Jul 2010 13:51:40 MST http://www.beyenetwork.be/blogs/loshin/archives/2010/07/thoughts_on_par.php
Semantic Consistency, Master Data Models, and Upcoming Webinar! In the past week, we have had a number of conversations with folks struggling with specific aspects of data integration for master data management. The main issue is that secondary users of what will eventually be master data do not always necessarily bound to abide by the primary users' data definitions. For example, the concept of "customer" means something different to the sales department than it does to those in customer support.

The upshot is that as data element definitions are reinterpreted, the results of sums, counts, and other aggregations start to be skewed. Ultimately, resulting reports are inconsistent, leading to a need for reconcilations, then loss of trust in the master data asset.

One way to address this is a concerted effort to normalize semantics prior to executing the data consolidation. This may shake out semantic inconsistencies and reduce the need for reconciliations.

More importantly, it implies the need for best practices in developing master data models. To that end, I will be presenting a talk on Accelerating MDM Initiatives with Master Data Modeling at a webinar sponsored by Embarcadero on July 28th. Lots of folks have already signed up, and I hope that it will provide an open forum for discussing some critical issues regarding master data modeling.



]]>
http://www.beyenetwork.be/blogs/loshin/archives/2010/07/semantic_consis.php Wed, 21 Jul 2010 16:52:29 MST http://www.beyenetwork.be/blogs/loshin/archives/2010/07/semantic_consis.php
Responsible [Data] Stewardship

By Caryn Maresic, Senior Consultant

summer reading by Robert S. Donovan via Flickr

Contribute to society and human well-being.   Avoid harm to others.   Be honest and trustworthy.   Be fair and take action not to discriminate.   Those are the first four items in the ACM Code of Ethics.   The ACM, for those who may not be familiar, is the Association for Computing Machinery, whose mission is to advance computing as a science and a profession.

In the course of a recent assignment with a major insurance carrier our team was asked to create various target lists for sales and marketing based on certain selection criteria.   While it is likely that all of the things they asked for were legal and ethical, we never questioned it.   As good Data Stewards, what should we have done in this case?   Should we be asking the business to justify their selection criteria?   Should we be checking to make sure there are no legal or ethical violations inherent in the rules?   A little research on the topic turned up this presentation  
which is very interesting and thought provoking.   That being said, it focuses more on the hot-topic issues like privacy and identity theft than it does the ethical dilemmas of sales and marketing.

This article tells the story of an ”Agent Profile System” set up by an insurer in Texas to rate its agents.   Agents who didn't score well were punished by not getting any new business.   The agents filed suit contending this was illegal as it compelled them to drop clients with low credit ratings, low income, and/or those who lived in undesirable locations in order to boost their own score.   Is the IT team that built the Agent Profile System responsible, at least in part, for discrimination?

When we are dealing with situations where lives are in danger the ethical answer is clear.   For example, no reasonable person would deny that engineers working on Space Shuttle software have a duty to report concerns regarding possible malfunction.   In the BI community our issues are not always so clear cut.   Sometimes discrimination is good for the business' bottom line, yet still unethical and possibly illegal.   If we go back to the statements ”Avoid harm to others” and ”Be fair” and ”take action not to discriminate” it appears that we should take serious our responsibility to be involved in how the business uses data.   In fact, I would argue that we should make ethical considerations part of our data governance program.

photo by Robert S. Donovan via Flickr (Creative Commons License)


Caryn_50x50 Caryn has over 20 years experience in providing high-quality data solutions to clients in the areas of Business Intelligence, Data Warehousing and System Integration.   Caryn has expertise in across industries with an emphasis in Pharmaceutical, Manufacturing, and Insurance.   Prior to joining to Baseline, she ran her own consulting company.



]]>
http://www.beyenetwork.be/blogs/dyche/archives/2010/07/responsible_dat.php Thu, 15 Jul 2010 06:00:00 MST http://www.beyenetwork.be/blogs/dyche/archives/2010/07/responsible_dat.php
Data Architect – Who is right for the job?

by Caryn Maresic, Senior Consultant

Design

The Data Architect is the core of any BI team.   It is important to choose a person with the right skill set.   As I tried to put together a list of skills I looked to IT Toolbox and Database Answers for help, but my mind wandered a bit.   System Construction. Data Architect. Data Warehouse. Software Factory.   We like to portray what we do in terms of construction and/or manufacturing.   A recent client bemoaned her departments inability to move from ”building custom cars” to ”an assembly line”.   Comparing ourselves to these burly industries might make us feel strong, but it does it accurately represent what we aspire to be?

What is a Data Architect?   What should they know how to do?     I borrowed the following description from this article. Before you click, read on and see if you can guess what this is really describing.   I think it is a great description for a Data Architect:

A Data Architect is qualified by education, experience, and imagination to enhance the function and quality of systems. The purpose of this pursuit is to improve the quality of life, increase productivity, and protect the health, security, and welfare of the business.

The best Data Architects are capable of analyzing a client's needs, goals, safety and business requirements and integrating this information into a design that is both pleasing to the eye and functional. They will work with the client closely to develop preliminary design concepts that meet their aesthetic, functional, and economic needs while maintaining adherence to standards.

In essence, the best Data Architects are part detective, part artist, and part psychologist and they use these skill sets to create systems that fit a client's tastes and needs with their budget in mind.

Doesn't that sound like a great job?   Sign me up!   What this is actually describing is an interior designer.   While I doubt that HGTV has any plans to showcase the next dashboard you build, we are indeed closer to Designing Women than Rosie the Riveter!   Stay tuned for future posts on the talents of a good Data Design Star.

photo by Annahape Gallery via Flickr (Creative Commons License)


Caryn_50x50 Caryn has over 20 years experience in providing high-quality data solutions to clients in the areas of Business Intelligence, Data Warehousing and System Integration.   Caryn has expertise in across industries with an emphasis in Pharmaceutical, Manufacturing, and Insurance.   Prior to joining to Baseline, she ran her own consulting company.



]]>
http://www.beyenetwork.be/blogs/dyche/archives/2010/07/data_architect.php Thu, 8 Jul 2010 06:00:00 MST http://www.beyenetwork.be/blogs/dyche/archives/2010/07/data_architect.php
I'm All Ears

By Caryn Maresic, Senior Consultant

Mickey Mouse by wrayckage via Flickr Creative Commons

Most Data Warehouse designs include constructs for Address, Phone, and/or Email for Customers.   Len Silverston came up with what he calls a Universal Data Model that does a very good job of abstracting address, email and phone number data.   I have seen clients use the Contact Point portion of his model as-is and with a few simplifications with great success.   That being said, in the area of Marketing and Sales, the manner in which we reach out to our customers and prospects gets more diverse every day.   Disneyland has just partnered with Verizon so that park guests can get real time information about the park and play Disney games on their phones....and, of course, Disney gets access to more information about its customers!

How does this new and ever changing world of communication change the way we think about and model contact points?   What would my ”address” look like if I were near the Haunted Mansion looking for a lunch spot?   Would it be different than if I were at Downtown Disney looking for a cup of coffee?   On Main Street looking for Winnie the Pooh?   In all instances I would be using the same phone, possibly the same IP address, but I would be in different locations which would be important to the marketeers at Disney.

As time goes by (and cell phone GPS systems become more accurate) I suspect that the way we run marketing campaigns to smart phones will be similar to the way in which we use billboards today.   Where the customer is physically located at any given time will be as important as the phone number and/or IP address, thus creating a two dimensional contact point.

Have you come across this issue in your organization?   Have you changed your data model to include two dimensional contact points?   If not, has the use of smart phones changed your data model in other ways?

photo by wrayckage via Flickr (Creative Commons license)


Caryn_50x50 Caryn has over 20 years experience in providing high-quality data solutions to clients in the areas of Business Intelligence, Data Warehousing and System Integration.   Caryn has expertise in across industries with an emphasis in Pharmaceutical, Manufacturing, and Insurance.   Prior to joining to Baseline, she ran her own consulting company.



]]>
http://www.beyenetwork.be/blogs/dyche/archives/2010/07/im_all_ears.php Thu, 1 Jul 2010 06:00:00 MST http://www.beyenetwork.be/blogs/dyche/archives/2010/07/im_all_ears.php
A Data Governance Primer (Part 3 of 3)

By Carol Newcomb, Senior Consultant

Diamond in the Rough: Data Quality

The third part of my summertime primer addresses Data Quality Analysis.   Don't even start a data quality analysis until you have completed the first two steps of your Root Cause Analysis--investigate & prioritize any potential causative factors, and start your metadata assessment.   Otherwise, you may be misled by your findings.

Diamonds
Data quality is defined as complete and accurate data that is ready for business consumption.   Sources of poor data quality may include lack of data entry rules, unclear data element definitions, inconsistent metadata definitions for field type, format or intent, or breakdowns in data transformation processes as data flow between systems or applications.   Poor data quality results in bad business decisions; it contributes to major problems in using data effectively, and costs companies millions of dollars/year in terms of rework and inefficiency.   Data quality, in combination with robust metadata definitions, is part of the foundation of good data governance.

Data Quality Analysis

A Data Quality Management process should be designed to enable an area to start with a simple approach and over time to mature to one that is more proactive and comprehensive.   Initially, investigation may be focused on single data elements or events.   As patterns, data commonalities and other relationships appear, the data quality management process will grow to support complete business processes.     A mature data quality management process will not just resolve individual issues; it will also track relationships between data elements, ensure that business rules are consistent and generate statistical analyses to monitor previously addressed issues to ensure that data quality is stable and that an early warning system is in place as part of the data governance program.   The goal is to design a data quality management lifecycle, as shown in this diagram:

Carol_fig1

Initial Data Quality Analysis Process

I. Define data scope
    • Determine data elements that are associated with or are direct results of the reported issue
    • Check that all metadata definitions are present and current
    • Enlist the involvement of the Data SME or Data Stewards
    • Identify all source systems where the data originates, is   entered or derived
II. Extract and profile the data
    • Extract the relevant data from all key source systems.
    • Design the profile.   A profile will consist, at a minimum, of total record counts, min/max values, frequency of unique values, and frequency of invalid values (if defined) for each data element profiled.  
    • Profile the data to determine key characteristics that are contributing to the issue, such as:
      1. Wrong values
      2. Missing values
      3. Corrupt transformation processes
      4. Incorrect business rules
      5. Incorrect usage rules
III. Analyze Data Profile Results
    • Summarize the key findings from the profile detail
    • Determine what key drivers are contributing to the impact
    • Determine accountability for the data quality issue
    • Involve other Data Stewards in troubleshooting and designing the data quality solution
IV. Design the Corrective Action Plan

Two types of plans should be developed to address known data quality issues: a corrective action plan to fix the immediate source of the problem identified, and an ongoing monitoring plan, where thresholds have been determined and metrics are routinely collected and reported to data stakeholders.   This monitoring process should be scalable based on the number of data elements being tracked.

    1. Corrective Action Plan
      • Does scope of problem warrant change in metadata definitions, business practices or data entry rules?
      • Does scope of problem warrant a data governance standard?
      • Does the corrective action plan include details on how to fix the source of the problem as well as ways to correct historical data in the system?
    2. Preventive Action Plan
      • This plan will be designed to minimize the probability of data quality issues from recurring
      • Determine ‘early warning triggers' based on designated thresholds.   These thresholds should reflect the business tolerance for inaccurate data (is 95% acceptable?)
      • If data latency is the source of a data quality issue, then latency thresholds should be included in the monitoring plan
      • Determine how frequently results of the monitoring plan will be reported to data stakeholders or governance oversight committees
Carol_fig2
So, now that summer is officially here, this wraps up my Data Governance Primer series.   Time for some iced tea and my favorite beach towel.   Come August, these little refreshers might be just the thing!

photo by Swamibu via Flickr (Creative Commons License)


CarolNewcomb_thumb Carol Newcomb is a Senior Consultant with Baseline Consulting. She specializes in developing BI and data governance programs to drive competitive advantage and fact-based decision making. Carol has consulted for a variety of health care organizations, including Rush Health Associates, Kaiser Permanente, OSF Healthcare, the Blue Cross Blue Shield Association and more. While working at the Joint Commission and Northwestern Memorial Hospital, she designed and conducted scientific research projects and contributed to statistical analyses.


]]>
http://www.beyenetwork.be/blogs/dyche/archives/2010/06/a_data_governan_4.php Thu, 24 Jun 2010 06:00:00 MST http://www.beyenetwork.be/blogs/dyche/archives/2010/06/a_data_governan_4.php
A Data Governance Primer (Part 2 of 3)

By Carol Newcomb, Senior Consultant

Minding Your Metadata

The second part of my summertime primer addresses ‘Minding your Metadata'.   I can just hear the collective groans and yawns now.   Sorry, but metadata collection is one of those necessary evils that may not be fun in the doing, but having it available as a resource to understand your data and use it appropriately is invaluable.   And you just might find some interesting surprises along the way!

Carol_image3

Metadata: What Is It & Why Do I Need It?

As you start your Root Cause Analysis (see last week's primer), you first need to examine existing data definitions (or lack thereof).   Metadata is the foundation of good data management and forms the basis for Data Governance.     Pardon me for stating the obvious, but metadata is fundamental to investigating and resolving data issues and it is the first place to start when investigating data quality issues.

Metadata is ”data about data”.   Plain and simple.   It includes descriptive information about electronic data used in common daily business practice.   Metadata includes items usually found in a data dictionary: field name, field length, retention rules, and security access, as well as additional descriptive information that may include data origin (source or system), creation/entry date, method of creation (key-entry or the result of a calculation), purpose of the data (its intended use), how frequently it gets updated or refreshed, and current location in a database (table, view, schema).   If a data element is the result of calculation logic or groupings (such as age categories), those business rules used to generate the resulting data values should be collected as part of the metadata.

A good example of metadata that you may use every day would be ‘document properties' in a Word document.   This feature captures data on the original document creation date, most recent access and update times, document creator, count of characters, words and pages.   If the document should be private, this will be indicated in its properties.   You may also tag the document by indicating key words in order to make it easier to find by you or others.

A few of the benefits of Metadata Management include:

  • Clarify rules for data entry
  • Reduce ambiguity around appropriate use of data elements
  • Eliminate problems associated with not having data definitions, business rules or transformation logic available
  • Validate legitimate values at the data element level
  • Provide evidence to regulators that security and confidentiality are protected
  • Centralize the storage and accessibility of metadata for end-users
  • Reduce the amount of effort required to research data results.

A Metadata Management Repository is a central location or system to collect and store metadata that may exist in disparate parts of the organization (data dictionaries, systems, spreadsheets, or people's brains). The metadata repository will store detailed definitions centrally on a network where other users can find it.

There are three general sources of metadata that should be included in this repository:

Business Metadata – Business metadata attributes facilitate identification, understanding, and appropriate use of existing data elements.   These include clear business names and descriptions, relevant business rules, descriptions of the data sources, security and privacy rules, etc.  
Technical Metadata – Describes the technical attributes of data such as physical location (host server, database server, schema, etc.), data types, any transformations applied and domain of valid values, relationships to other data elements, precision, and lineage.   Technical metadata is used by business users and by IT staff to design efficient databases, queries, and applications, and to reduce duplication of data.  
Operational Metadata – Describes the attributes of routine operations on data and related statistics.   These include job schedules and descriptions, data movement and transformation processes, data read, update and performance statistics, volume statistics, backup and archival information.   Operational metadata is used by operations staff, and DBA's to tune the system and ensure its continued efficient operations.   It is also used by business users to track such events as ”last use” of a field, and ”last load” of a data element.
Exciting stuff, huh?   Well, the whole point of metadata is to have the information about data available to a multitude of users when they need it, to keep it current, and to avoid confusion around usage.   So if you appreciate having a clean bathroom, and knowing where you keep your antiperspirant, you will also appreciate having good metadata!   The time for spring cleaning is well overdue.

CarolNewcomb_thumb Carol Newcomb is a Senior Consultant with Baseline Consulting. She specializes in developing BI and data governance programs to drive competitive advantage and fact-based decision making. Carol has consulted for a variety of health care organizations, including Rush Health Associates, Kaiser Permanente, OSF Healthcare, the Blue Cross Blue Shield Association and more. While working at the Joint Commission and Northwestern Memorial Hospital, she designed and conducted scientific research projects and contributed to statistical analyses.



]]>
http://www.beyenetwork.be/blogs/dyche/archives/2010/06/a_data_governan_2.php Thu, 17 Jun 2010 06:00:00 MST http://www.beyenetwork.be/blogs/dyche/archives/2010/06/a_data_governan_2.php
Data Quality and Data Virtualization As more organizations are starting in earnest to consider deploying master data systems, they are also beginning to see where some fundamental issues may block the creation of a full-scale master data repository. This had me thinking about potential ways around some of these issues (especially where governmental regulations prevent moving data across borders!), and one aspect I started considering is the use of a data federation or virtualization model that can abstract the pereption of a unified view without necessarily copying the data.

At the same time, I was approached by Composite Software to do some research on the fesibility of incorporating data quality management techniques within a data virtualization framework. That activity only has continued to pique my interest in integrating virtualization and MDM, and it also allowed me to explore some ideas I explored in my book on data quality management. Meanwhile, the result of that task is an interesting white paper, which you can access via Composite's web site - search down to the "analyst reports" section of the page. Let me know your thoughts!



]]>
http://www.beyenetwork.be/blogs/loshin/archives/2010/06/data_quality_an.php Tue, 15 Jun 2010 10:16:39 MST http://www.beyenetwork.be/blogs/loshin/archives/2010/06/data_quality_an.php
A Data Governance Primer (Part 1)

By Carol Newcomb, Senior Consultant

Newcomb_Graphic_01b

They say that Data Governance is about People, Process and Organization.   Much of the design work in planning for data governance is around people's roles and responsibilities, then designing the organizational structure that will provide authority for decisions to be made and enforced.   The processes, however, are not new.   They are probably already being practiced within your organization, just in a decentralized, informal way.   In this blog series, I discuss the processes for 1) investigating and isolating the data quality issues—Root Cause Analysis—, 2) starting to collect complete Metadata Definitions, and 3) performing Data Quality Analysis.   Only when your governance group has worked through each step, in order, will you be more likely to design the appropriate solution.

Root Cause Analysis

The process of data governance is fundamentally very simple.

  1. Identify the data quality issues to address
  2. Prioritize the portfolio of issues to isolate/tackle the most important
  3. Perform Root Cause Analysis to determine the true source of the data issue
  4. Design the corrective action
  5. Formalize the correction through consideration & approval by the Data Governance organization
  6. Implement the fix
  7. Monitor the results

It seems like when we start to map out the discrete steps involved in the data governance process, much of the work is already being done in informal ways throughout the organization.   What some folks don't realize is that data governance is often nothing more than formalizing a whole bunch of informal processes that either don't get communicated, or aren't accepted as a data standard.

Root Cause Analysis is the process of identifying probable causes of a data issue, and isolating the contributing factors.   In order to resolve any particular issue, root cause analysis involves fact-finding, drilling into details of the problem, talking to the right people, and separating out other associated (but not contributing) factors.

A standard tool for supporting the detailed findings is the Ishikawa Diagram, below.  

Newcomb_Graphic_02
To conduct a thorough Root Cause Analysis, use the following checklist:
  • Diagnose the problem as if you are a physician or a detective. Consider all possible sources of the symptom. Don't rule anything out yet!
  • Boil the ocean—be exhaustive and creative.
  • Don't practice problem solving before collecting all possible causes.
  • Practice the ”5 Why's”—don't stop asking ”Why” until you have exhausted every conceivable potential reason.
  • Rank the factors if possible.   Identify the Primary causes versus the Secondary or associated factors.
  • Rule out each possible factor one at a time.   Justify why (you may need to come back to this later).
  • Find all potential business process and data owners to involve them in your understanding of the possible sources of the problem.
  • Share the findings with everyone involved in troubleshooting. They could rule out certain factors with their knowledge.
  • Test your hypotheses with actual data.    
  • Fix the problem and test again.
  • Publish/share your findings and fixes.   Communicating your findings may reveal additional factors you hadn't considered.

After a thorough Root Cause Analysis has been completed, Data Stewards should proceed to Metadata Analysis and Data Quality Analysis.   These two techniques will be discussed in my next blogs.


CarolNewcomb_thumb Carol Newcomb is a Senior Consultant with Baseline Consulting. She specializes in developing BI and data governance programs to drive competitive advantage and fact-based decision making. Carol has consulted for a variety of health care organizations, including Rush Health Associates, Kaiser Permanente, OSF Healthcare, the Blue Cross Blue Shield Association and more. While working at the Joint Commission and Northwestern Memorial Hospital, she designed and conducted scientific research projects and contributed to statistical analyses.



]]>
http://www.beyenetwork.be/blogs/dyche/archives/2010/06/a_data_governan_1.php Thu, 10 Jun 2010 06:00:00 MST http://www.beyenetwork.be/blogs/dyche/archives/2010/06/a_data_governan_1.php
Change always comes bearing gifts
A story.....
  • Vendor X sells its ERP to a company in Healthcare;
  • Client wishes to setup its informational environment (data sharing, BI, CPM etc..) right from the start;
  • Vendor X pushes the 'standard' solution' they sell;
  • Client decides to decouple their informational environment from its source(s) for several reasons (heterogeneous sources, sustainability, compliance, adaptability etc..);
  • Vendor X deploys their ERP;
  • Client starts to design and build the informational environment;
  • Interfaces between ERP of vendor X and the informational environment are developed;
  • The ERP of vendor X off does not offer functional interfaces ('X keeps pushing their standard product'), so client needs to connect on the physical level;
  • Going-live is near; of both the ERP and the new informational environment

And then change management of vendor X regarding the ERP kicks in.

Client: 'What's your release schedule for patches'?
X: 'Every 2 weeks'
Client: 'Huh'?

Client thinks: 'Damn, how can I keep up with this change schedule?'

Client: 'Well, can you tell me anything regarding the average impact of these patches?'
X: 'Well, they can be very small and very big'

Client thinks: 'Ok, what are you NOT telling me'

Client:'Ok, but this ERP is like 15 years old, so give me an overview of the average impact'
X: 'Basically anything can happen'

Client thinks: 'o, o'

Client: 'Ok, but the majority of these changes are of course situated in the application layer, not the data layer?'
X: 'Well..anything can happen.'

Client thinks: 'Is it warm in here?'

Client: 'Anything? Also in the data layer? Table changes, integrity changes, domain type changes, value changes?'
X: 'Aye'

Client thinks: 'Ok - I'm dead'

Client: '...at least tell me that existing structures always remain intact and the data remains to be auditable - extent instead of replace for example'
X: 'Huh'?

Client thinks: 'Well, at least I am healthy...'

Client: 'hmm...just a side note, we use Change Data Capture, I assume that these changes are fully logged?'
X: 'Nah - log is turned off, otherwise we can't deploy the changes'

Client thinks: '..hmm....is my resume up to date?'


My point; do not assume your vendor (of any system) to engage in professional application development and a change management policy that takes into account the simple fact that data of these information systems need to be shared with other information systems in your company.

Change management and professional application development needs to be important criteria regarding the selection of information systems.




]]>
http://www.beyenetwork.be/blogs/damhof/archives/2010/06/change_always_c.php Tue, 8 Jun 2010 14:29:39 MST http://www.beyenetwork.be/blogs/damhof/archives/2010/06/change_always_c.php
IT-Business Alignment: Let The Business Drive

By Caryn Maresic, Senior Consultant

Parents on Vacation via Flickr (Creative Commons)

Julia's parents were planning a vacation.   Her mother thought Pensacola would be a great destination—she's heard so much about the wildlife, especially the dolphins!   Her father wants to see the National Naval Aviation Museum and the Blue Angels.   Since Julia's traveled extensively, her parents asked her to make all the arrangements.   While having dinner with them to discuss plans, she jotted down the following notes:

  • Location:   Moderately-priced hotel close to water/sights.
  • Budget: $3,000 for transportation and accommodations.
  • Activities:   Beach and nature activities (Mom), science/historic sights (Dad)
  • Duration: 10 days.

Julia felt honored that her parents trusted her to get the job done.   After doing some online research, she made all the reservations and met with her parents to review the reservations.   She eagerly awaited the look on her parents' faces as they scanned the vacation itinerary and read through the glossy brochures.

”Hawaii?”, they said in unison.   ”We didn't want to go to Hawaii!"

"Honey, we chose Florida because we can drive there.   I don't want to fly anymore.   Flying is such a pain,” Dad grumbled.

”I appreciate what you've done, Julia, but an old friend of mine lives near Pensacola and I was hoping to visit while we were there.” said Mom.

”But, Mom!”, exclaimed Julia, ”You said you wanted beaches, dolphins, sunny weather.   Dad, you like science and history—what about Pearl Harbor?   You two can't go to the gulf coast—what about the oil spill?”

What happened here is typical of what happens to IT projects all the time.   It's easy to say that we wouldn't do what Julia did.   Would we?   Don't we oftentimes:

  • Interview the business and record the requirements in an abstract way.
  • Believe that the we can deliver something better than what the business asked for.
  • Assume that the business lacks the capability to understand the technology.
  • Fail to get all of the requirements.   Not exactly our fault, but still a problem.
  • Neglect to keep the business involved in the process.

There has been a lot of buzz on IT-Business alignment of late, including this article on some specific companies that are going the extra mile: Beyond Alignment—as well as this one on lack of user involvement: Why IT Projects Fail: Lack of User Involvement.   Most companies aren't as progressive.   The willingness to work together has to occur at all levels. Only when we let them drive can we deliver, if not what they asked for, then at least something useful.

photo by stevendepolo via Flickr (Creative Commons license)


Caryn_50x50 Caryn has over 20 years experience in providing high-quality data solutions to clients in the areas of Business Intelligence, Data Warehousing and System Integration.   Caryn has expertise in across industries with an emphasis in Pharmaceutical, Manufacturing, and Insurance.   Prior to joining to Baseline, she ran her own consulting company.



]]>
http://www.beyenetwork.be/blogs/dyche/archives/2010/06/it-business_ali.php Thu, 3 Jun 2010 06:00:00 MST http://www.beyenetwork.be/blogs/dyche/archives/2010/06/it-business_ali.php
Agile Analytics: Put Your SaaS in the Sand By Rob Paller, Consultant

Buried_in_sand by eden pictures via Flickr (Creative Commons License)

Recently at a client, the data warehouse administrator was asked to define a sandbox environment in the production data warehouse for   analysts and developers working on a small project. The idea behind this sandbox was to allow the team a working area for collaboration and intermediate storage of results while working with the data in a purely ad hoc capacity. Instantly it was recognized this could be the start of something bigger within the organization—something that could not currently be provided by the incumbent business intelligence tools. The response had to be formulated quickly in order to avoid stifling the creativity of the analysts—or worse, the progress of the project—but care had to be taken as well; if managed incorrectly it could get out of hand and become a waste of system resources and a drain on human resources that had already been spread thin.   The business unit in question is looking to move from the confines the current business intelligence environment and push the edges.

This was a group of analysts that wanted to get their hands dirty and weren't afraid to fail. They wanted to mash data together that previously could not be done by the business intelligence tools in their controlled ad hoc environments. This was data mining for the next set of KPIs that would shape the way business moves forward.

The concept of agile analytics is not new, eBay presented on and blogged about this concept in 2008. The idea at this client was simple. By leveraging the existing enterprise data warehouse system to house their sandbox environment the duplication of data is all but eliminated. Groups interested in sharing data between their sandbox environments are strongly discouraged until the data has been properly integrated into the production environment. The sandbox environments would also be given a short life expectancy at their inception to prevent the prototypes from becoming production and data ending up in a wasteland. This all sounded great on paper.

In the midst of a development architecture overview, a brief conversation among a few enterprise architects uncovered the potential Screw-Me Scenario that could bring the concept of agile analytics to an untimely demise. ”The users of the data warehouse are not permitted to write ad hoc queries outside of a controlled business intelligence tool. They might write a bad query.” Thanks for the warning, we'll be sure to refine our pitch to the enterprise architects to diffuse this scenario before it turns ugly.

In Oliver Ratzesberger's presentation for eBay's Analytics as a Service, he acknowledges that the metrics we already know are cheap and the unknown metrics are expensive. But the known metrics are not pushing the edges. Known metrics are found in the middle of the box. Agile analytics is about pushing the edges about how your enterprise data warehouse is used to improve response to the needs of the business. It is about the evolution of the user community from one who plays in controlled ad hoc environments to encouraging them to experiment with new ideas and not to fear failing along the way. Agile analytics is about encouraging your users reach out for the edges and P U S H. Only once the edges are stretched can the middle of the box redefined.

photo by edenpictures via Flickr (Creative Commons License)


RobPaller_bw_100Rob Paller is an expert at business analytics and database administration. Since joining Baseline, Rob has been responsible for developing a case analysis system to streamline the oversight of food assistance benefits, implementing a common citizen data model, and assisting in the rollout of a new public assistance data model integrating data from over 10 years of legacy with a new benefit eligibility determination system.

]]>
http://www.beyenetwork.be/blogs/dyche/archives/2010/05/agile_analytics.php Thu, 27 May 2010 06:00:00 MST http://www.beyenetwork.be/blogs/dyche/archives/2010/05/agile_analytics.php