Data does not live in isolation. It acquires meaning when you turn it into information that allows you to take action. The sequence from data to information to action revolves around metadata. Metadata is anything you need to know in order to interpret it. Business metadata provides context, required to imbue data with meaning.
There’s a two-way street from data quality to metadata. From one end, you need (good) metadata to ensure data quality. And from the other end, data quality information is valuable (business) metadata. Inmon et al (2007): “... adding metadata context about data quality gives the organization reasonable assurance that the figures accurately represent the correct values.” By consistently gathering data profiling statistics, you can show the business how data quality is improving or degrading over time.
Along with the proliferation of systems, metadata management has gained in importance. Because knowledge workers spend so much time looking for context to interpret their data, efficiency gains from better metadata can be huge. And since our reliance on IT is unlikely to wane anytime soon, this trend will persist.
Technical Versus Business Metadata
The generic term metadata carries several meanings. For the purpose of this article, we’ll make a distinction between technical and business metadata. Although it is certainly justified to position these on a continuum, we’ll discuss them as being two separate classes of metadata.
Technical metadata encompasses everything that is required to make a database run (relational database management system, or RDBMS). Imagine things like indexes, record counts, table sizes, the relations between tables, field type and lengths, primary and foreign keys, etc. It provides a technical description of data and might also include table or attribute names, and their descriptions.
Business metadata enables correct interpretation of data so that you can turn it into information. Examples are column headers in a report or in Excel, units of measurement, a timestamp when data was gathered, notions about data quality accuracy, etc. However, the bulk of business metadata is simply floating around in people’s heads. The primary purpose of business metadata is to abstract information to a wider context.
Defining Data Quality and Metadata
Discussions about data quality can sometimes turn into an almost philosophical debate. There appears to be a continuum running from quality being an inherent property of data, to quality being a function of its use by some person or organization. In the former case quality is measured by conformance of data with physical properties “out there.” In the latter case, you measure data quality by the consequences (process failures) of using it.
Note that everywhere along this continuum metadata plays a crucial, albeit somewhat different role. For proponents of the absolute (“inherent”) data quality position, metadata must be of immaculate quality. Otherwise you lose the connection to the outside objects being represented, and therefore your measurements of quality become meaningless.
When you consider data quality a function of its use by some person, then again, metadata plays a crucial role to ensure it is used properly. Without proper knowledge on the context of data, how else can you know what decisions to base on it?
This leads to a remarkable finding about metadata. Accuracy can be less of an issue for certain non-critical data. This is not the case for metadata, however. No metadata component can be ignored. Metadata must be of such quality that business and technical users can rely on it. Always. If not, either measures of inherent quality become meaningless, or measures of relative data quality are compromised because the data gets used the wrong way.
The Value of Metadata
Obviously there is a cost involved when you invest in a centralized metadata repository. Why should you choose to do so? When I get an inquiry from my accountant to specify my expenses, the time I spend is probably 98% on finding the receipts, 1% to add up the numbers, and another 1% to report this back to her. Note that searching for the receipts is the equivalent to trying to get access to metadata. It equates to determining which receipts exactly I should report.
Metadata saves enormous amounts of time when trying to find a particular piece of information. Productivity studies have shown that when you distract a colleague, it may cost up to 15 minutes for this person to get back into his flow of work. So on top of the time you both
spend trying to make sense of a piece of data, at least one colleague gets distracted. And hopefully the first
person you turn to knows what you seek to find out.
Instead of constantly having to bother colleagues, it makes sense to store the collective "tribal" knowledge within your organization into a centralized repository. Eventually everybody will form a habit of turning to it. This will serve to both improve efficiency as well as lower dependency on certain key employees.
When your data and metadata are scattered throughout the organization, nobody will or can have confidence in the data. The result is more people spending more time verifying numbers, trying to make sense of data, and all this time spent is a net waste. It adds no value whatsoever, and cannot ever be recouped again.
Just like “regular” data from disparate silos, metadata needs to be brought together as well. You consolidate metadata so that you can reduce the cost of operations, but also to enhance organizational knowledge, as well as provide better (customer) service. Unless you can find all metadata in one central place (with one single log-on), people will not “learn” where they can find it. Needless to say that usability of this interface is an important factor in adoption.
Metadata and Flossing
Metadata is like flossing in many ways. Most BI
professionals buy into the importance of recording
metadata. So knowledge and awareness is there. And the same holds for flossing. Your dentist tells you how important it is, and we agree. We know it is good for us, and we know we should be doing it. But somehow, it just doesn’t get done. At least not often enough. And the bad thing with metadata is that once it starts to decay, the value drops exponentially. And once the value is gone, nobody is motivated to do anything to restore it. So we have a pretty strong self-reinforcing loop here.
If you are aware of this dynamic, it is obvious you need to make updating and expanding metadata a top priority. So reward it commensurately. And continue to do so. Because there are just too many excuses not to do it, and fall into the trap like so many vicious circles that will eventually drive your effort into the ground. Changing this dynamic requires persistence, and incessantly advertising the value of good metadata. Just like the need for flossing…
Organizing Metadata Ownership
Once the value of owning a metadata repository is clear, the question arises: How do we get started? One of the central questions is where to position metadata ownership in the organization. There are interesting parallels with “regular” data ownership.
Ownership of data is the process of exercising sole authority over the resources being governed. Hence, it is (often) possible to assign ownership of data to a particular business line. The business manager in charge of processes that create
the data is a natural proprietor.
Ownership of enterprise processes, procedures, and business metadata however, can rarely be assigned to one business unit. In most organizations, this isn’t achievable. The reason is that it requires coordination of resources from multiple business units. When “true” ownership isn’t feasible, corporations tend to favor concepts of stewardship instead.
Data stewardship is the non-technical portion of responsibilities for the partnership in data management. These concepts are similar for data and metadata alike. Just like data in your enterprise data warehouse are the amalgam of multiple source systems (with different owners), metadata arises as a joint effort across business units.
Metadata repository management is often described as a mini data warehouse process. The launch might be a project
, but after that it turns into a process
. Inmon et al (2008): “Building an enterprise metadata repository is not a short-term project. And even after it has been built, it requires an ongoing commitment to keep it current and accurate.”
Business lines have the ultimate responsibility for the data and metadata content. When multiple business units are involved as data gets created, data ownership needs to be replaced by data stewardship. Business units “own” the data, and have knowledge about its creation, its peculiarities, and how it gets used. IT acts "merely" as a custodian or caretaker of the physical implementation. IT may be held responsible for data management and the associated metadata repository. But the content
should be managed by the business, preferably as independently as possible.
To achieve knowledge capture, it must be easy. It should fit into the daily routine of business people. If they have to go out of their way to record it, it won’t happen. Likewise, the access to metadata has to be easy. Otherwise your repository will not get used, and you will not benefit from possible efficiency increases.References:
Bill Inmon, Bonnie O’Neill & Lowell Fryman (2007), Business Metadata: Capturing Enterprise Metadata
. ISBN# 0123737265
David Marco (2000), Building and Managing the Metadata Repository
. ISBN# 0471355232
Hans Wegener (2007), Aligning Business and IT with Metadata: The Financial Services Way
. ISBN# 0470030313
SOURCE: Metadata & Data Quality
Recent articles by Tom Breur