Metadata & Data Quality

Originally published 9 September 2010

Data does not live in isolation. It acquires meaning when you turn it into information that allows you to take action. The sequence from data to information to action revolves around metadata. Metadata is anything you need to know in order to interpret it. Business metadata provides context, required to imbue data with meaning.

There’s a two-way street from data quality to metadata. From one end, you need (good) metadata to ensure data quality. And from the other end, data quality information is valuable (business) metadata. Inmon et al (2007): “... adding metadata context about data quality gives the organization reasonable assurance that the figures accurately represent the correct values.” By consistently gathering data profiling statistics, you can show the business how data quality is improving or degrading over time.

Along with the proliferation of systems, metadata management has gained in importance. Because knowledge workers spend so much time looking for context to interpret their data, efficiency gains from better metadata can be huge. And since our reliance on IT is unlikely to wane anytime soon, this trend will persist.

Technical Versus Business Metadata

The generic term metadata carries several meanings. For the purpose of this article, we’ll make a distinction between technical and business metadata. Although it is certainly justified to position these on a continuum, we’ll discuss them as being two separate classes of metadata.

Technical metadata encompasses everything that is required to make a database run (relational database management system, or RDBMS). Imagine things like indexes, record counts, table sizes, the relations between tables, field type and lengths, primary and foreign keys, etc. It provides a technical description of data and might also include table or attribute names, and their descriptions.

Business metadata enables correct interpretation of data so that you can turn it into information. Examples are column headers in a report or in Excel, units of measurement, a timestamp when data was gathered, notions about data quality accuracy, etc. However, the bulk of business metadata is simply floating around in people’s heads. The primary purpose of business metadata is to abstract information to a wider context.

Defining Data Quality and Metadata

Discussions about data quality can sometimes turn into an almost philosophical debate. There appears to be a continuum running from quality being an inherent property of data, to quality being a function of its use by some person or organization. In the former case quality is measured by conformance of data with physical properties “out there.” In the latter case, you measure data quality by the consequences (process failures) of using it.

Note that everywhere along this continuum metadata plays a crucial, albeit somewhat different role. For proponents of the absolute (“inherent”) data quality position, metadata must be of immaculate quality. Otherwise you lose the connection to the outside objects being represented, and therefore your measurements of quality become meaningless.

When you consider data quality a function of its use by some person, then again, metadata plays a crucial role to ensure it is used properly. Without proper knowledge on the context of data, how else can you know what decisions to base on it?

This leads to a remarkable finding about metadata. Accuracy can be less of an issue for certain non-critical data. This is not the case for metadata, however. No metadata component can be ignored. Metadata must be of such quality that business and technical users can rely on it. Always. If not, either measures of inherent quality become meaningless, or measures of relative data quality are compromised because the data gets used the wrong way.

The Value of Metadata

Obviously there is a cost involved when you invest in a centralized metadata repository. Why should you choose to do so? When I get an inquiry from my accountant to specify my expenses, the time I spend is probably 98% on finding the receipts, 1% to add up the numbers, and another 1% to report this back to her. Note that searching for the receipts is the equivalent to trying to get access to metadata. It equates to determining which receipts exactly I should report.

Metadata saves enormous amounts of time when trying to find a particular piece of information. Productivity studies have shown that when you distract a colleague, it may cost up to 15 minutes for this person to get back into his flow of work. So on top of the time you both spend trying to make sense of a piece of data, at least one colleague gets distracted. And hopefully the first person you turn to knows what you seek to find out.

Instead of constantly having to bother colleagues, it makes sense to store the collective "tribal" knowledge within your organization into a centralized repository. Eventually everybody will form a habit of turning to it. This will serve to both improve efficiency as well as lower dependency on certain key employees.

When your data and metadata are scattered throughout the organization, nobody will or can have confidence in the data. The result is more people spending more time verifying numbers, trying to make sense of data, and all this time spent is a net waste. It adds no value whatsoever, and cannot ever be recouped again.

Just like “regular” data from disparate silos, metadata needs to be brought together as well. You consolidate metadata so that you can reduce the cost of operations, but also to enhance organizational knowledge, as well as provide better (customer) service. Unless you can find all metadata in one central place (with one single log-on), people will not “learn” where they can find it. Needless to say that usability of this interface is an important factor in adoption.

Metadata and Flossing

Metadata is like flossing in many ways. Most BI professionals buy into the importance of recording metadata. So knowledge and awareness is there. And the same holds for flossing. Your dentist tells you how important it is, and we agree. We know it is good for us, and we know we should be doing it. But somehow, it just doesn’t get done. At least not often enough. And the bad thing with metadata is that once it starts to decay, the value drops exponentially. And once the value is gone, nobody is motivated to do anything to restore it. So we have a pretty strong self-reinforcing loop here.

If you are aware of this dynamic, it is obvious you need to make updating and expanding metadata a top priority. So reward it commensurately. And continue to do so. Because there are just too many excuses not to do it, and fall into the trap like so many vicious circles that will eventually drive your effort into the ground. Changing this dynamic requires persistence, and incessantly advertising the value of good metadata. Just like the need for flossing…

Organizing Metadata Ownership

Once the value of owning a metadata repository is clear, the question arises: How do we get started? One of the central questions is where to position metadata ownership in the organization. There are interesting parallels with “regular” data ownership.

Ownership of data is the process of exercising sole authority over the resources being governed. Hence, it is (often) possible to assign ownership of data to a particular business line. The business manager in charge of processes that create the data is a natural proprietor.

Ownership of enterprise processes, procedures, and business metadata however, can rarely be assigned to one business unit. In most organizations, this isn’t achievable. The reason is that it requires coordination of resources from multiple business units. When “true” ownership isn’t feasible, corporations tend to favor concepts of stewardship instead.

Data stewardship is the non-technical portion of responsibilities for the partnership in data management. These concepts are similar for data and metadata alike. Just like data in your enterprise data warehouse are the amalgam of multiple source systems (with different owners), metadata arises as a joint effort across business units.

Conclusion

Metadata repository management is often described as a mini data warehouse process. The launch might be a project, but after that it turns into a process. Inmon et al (2008): “Building an enterprise metadata repository is not a short-term project. And even after it has been built, it requires an ongoing commitment to keep it current and accurate.”

Business lines have the ultimate responsibility for the data and metadata content. When multiple business units are involved as data gets created, data ownership needs to be replaced by data stewardship. Business units “own” the data, and have knowledge about its creation, its peculiarities, and how it gets used. IT acts "merely" as a custodian or caretaker of the physical implementation. IT may be held responsible for data management and the associated metadata repository. But the content should be managed by the business, preferably as independently as possible.

To achieve knowledge capture, it must be easy. It should fit into the daily routine of business people. If they have to go out of their way to record it, it won’t happen. Likewise, the access to metadata has to be easy. Otherwise your repository will not get used, and you will not benefit from possible efficiency increases.

References:

Bill Inmon, Bonnie O’Neill & Lowell Fryman (2007), Business Metadata: Capturing Enterprise Metadata. ISBN# 0123737265

David Marco (2000), Building and Managing the Metadata Repository. ISBN# 0471355232

Hans Wegener (2007), Aligning Business and IT with Metadata: The Financial Services Way. ISBN# 0470030313


  • Tom BreurTom Breur
    Tom Breur, Principal with XLNT Consulting, has a background in database management and market research. For the past 10 years, he has specialized in how companies can make better use of their data. He is an accomplished teacher at universities, MBA programs and for the Certified Business Intelligence Professional (CBIP) program. He is a regular keynoter at international conferences.  Currently,he is a member of the editorial board of the Journal of Targeting, the Journal of Financial Services Management and Banking Review. He acts as an advisor for The Council of Financial Competition and the Business Banking Board and was cited among others in Harvard Management Update about state-of-the-art data analytics. His company, XLNT Consulting, helps companies align their IT resources with corporate strategy, or in plain English, he helps companies make more money with their data. For more information you can email him at tombreur@xlntconsulting.com or call +31646346875.

     

Recent articles by Tom Breur

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!