By Carol Newcomb, Senior Consultant
Minding Your Metadata
The second part of my summertime primer addresses ‘Minding your Metadata’. I can just hear the collective groans and yawns now. Sorry, but metadata collection is one of those necessary evils that may not be fun in the doing, but having it available as a resource to understand your data and use it appropriately is invaluable. And you just might find some interesting surprises along the way!
Metadata: What Is It & Why Do I Need It?
As you start your Root Cause Analysis (see last week’s primer), you first need to examine existing data definitions (or lack thereof). Metadata is the foundation of good data management and forms the basis for Data Governance. Pardon me for stating the obvious, but metadata is fundamental to investigating and resolving data issues and it is the first place to start when investigating data quality issues.
Metadata is ”data about data”. Plain and simple. It includes descriptive information about electronic data used in common daily business practice. Metadata includes items usually found in a data dictionary: field name, field length, retention rules, and security access, as well as additional descriptive information that may include data origin (source or system), creation/entry date, method of creation (key-entry or the result of a calculation), purpose of the data (its intended use), how frequently it gets updated or refreshed, and current location in a database (table, view, schema). If a data element is the result of calculation logic or groupings (such as age categories), those business rules used to generate the resulting data values should be collected as part of the metadata.
A good example of metadata that you may use every day would be ‘document properties’ in a Word document. This feature captures data on the original document creation date, most recent access and update times, document creator, count of characters, words and pages. If the document should be private, this will be indicated in its properties. You may also tag the document by indicating key words in order to make it easier to find by you or others.
A few of the benefits of Metadata Management include:
- Clarify rules for data entry
- Reduce ambiguity around appropriate use of data elements
- Eliminate problems associated with not having data definitions, business rules or transformation logic available
- Validate legitimate values at the data element level
- Provide evidence to regulators that security and confidentiality are protected
- Centralize the storage and accessibility of metadata for end-users
- Reduce the amount of effort required to research data results.
A Metadata Management Repository is a central location or system to collect and store metadata that may exist in disparate parts of the organization (data dictionaries, systems, spreadsheets, or people’s brains). The metadata repository will store detailed definitions centrally on a network where other users can find it.
There are three general sources of metadata that should be included in this repository:
Business Metadata – Business metadata attributes facilitate identification, understanding, and appropriate use of existing data elements. These include clear business names and descriptions, relevant business rules, descriptions of the data sources, security and privacy rules, etc.
Technical Metadata – Describes the technical attributes of data such as physical location (host server, database server, schema, etc.), data types, any transformations applied and domain of valid values, relationships to other data elements, precision, and lineage. Technical metadata is used by business users and by IT staff to design efficient databases, queries, and applications, and to reduce duplication of data.
Operational Metadata – Describes the attributes of routine operations on data and related statistics. These include job schedules and descriptions, data movement and transformation processes, data read, update and performance statistics, volume statistics, backup and archival information. Operational metadata is used by operations staff, and DBA’s to tune the system and ensure its continued efficient operations. It is also used by business users to track such events as ”last use” of a field, and ”last load” of a data element.Exciting stuff, huh? Well, the whole point of metadata is to have the information about data available to a multitude of users when they need it, to keep it current, and to avoid confusion around usage. So if you appreciate having a clean bathroom, and knowing where you keep your antiperspirant, you will also appreciate having good metadata! The time for spring cleaning is well overdue.
Carol
Newcomb is a Senior Consultant with Baseline Consulting. She
specializes in developing BI and data governance programs to drive
competitive advantage and fact-based decision making. Carol has
consulted for a variety of health care organizations, including Rush
Health Associates, Kaiser Permanente, OSF Healthcare, the Blue Cross
Blue Shield Association and more. While working at the Joint Commission
and Northwestern Memorial Hospital, she designed and conducted
scientific research projects and contributed to statistical analyses.
Posted June 17, 2010 6:00 AM
Permalink | No Comments |



