Blog: David Loshin Subscribe to this blog's RSS feed!

David Loshin

Welcome to my BeyeNETWORK Blog. This is going to be the place for us to exchange thoughts, ideas and opinions on all aspects of the information quality and data integration world. I intend this to be a forum for discussing changes in the industry, as well as how external forces influence the way we treat our information asset. The value of the blog will be greatly enhanced by your participation! I intend to introduce controversial topics here, and I fully expect that reader input will "spice it up." Here we will share ideas, vendor and client updates, problems, questions and, most importantly, your reactions. So keep coming back each week to see what is new on our Blog!

About the author >

David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management: The Data Quality Approachand Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information. David can be reached at or at (301) 754-6350.

Editor's Note: More articles and resources are available in David's BeyeNETWORK Expert Channel. Be sure to visit today!

Recently in Master Data Management Category

Over the past few years we have been evaluating issues, practices, techniques, and proceses for justifying, planning, designing, and managing a successful master data management (MDM) program. Recently, my book, Master Data Management (The MK/OMG Press) was published by Morgan Kaufmann, and is now available.

Now that MDM has been around for a few years, I believe that in order to make MDM programs successful, we need to jump off the hype curve and start considering the real issues that will be faced when attempting to build a reasonable business case and initiate the program. In this book, I look at the business drivers, identify key stakeholder archetypes, review management processes such as data governance and organizational change, and then look at the techniques and tools to make MDM work. I hope that this book sounds intriguing and that you'll geta copy. If youy get a copy and you like it, I would be thrilled for you to post a review at amazon as well!

In addition, I have put together a companion web site to continue the MDM discourse and to provide extra insights as more case studies and success stories emerge. I have the support of a number of great technology sponsors to build the content, and I am definitelty interested in hearing about your MDM ideas also. Please visit the site ( and let me know what you think!

Posted October 28, 2008 8:05 AM
Permalink | 1 Comment |

As was rumored a few weeks back (see my previous blog entry), Microsoft continues its acquisitions in the data quality/MDM arena with its announced purchase of Israeli data quality company Zoomix. This should yet again add to Microsoft's growing incorporation of data quality and data mining capabilities into its SQL Server/Office productivity platform...

Posted July 14, 2008 6:34 AM
Permalink | No Comments |

Just heard through the rumor mill that Microsoft is "thinking about acquiring Zoomix." It would make sense that Microsoft might consider bulking up in its capability to support a potential MDM offering (note last year's acquisition of Stratature). I would look forward to seeing something official, though...

Posted June 6, 2008 3:26 PM
Permalink | No Comments |

Hey, sorry it has been a while since my last blog entry. I have been focused on finishing up my book on master data management (MDM), which thankfully is now finished. Some interesting thoughts gelled over the past 6 months in which I have been furiously assembling material for the book, which is due now to be published in the Fall by Elsevier:

- MDM is more of a means than an end, and it is more likely to be justified in the context of other enterprise activities such as CRM or ERP.

- I have started to bristle at the phrase "golden copy." I now think that MDM is more about providing universal transparent access to a sngle representation of uniquely identifiable entity data, but that does not mean that entity data has to sit in its own silo.

- Comprehensive master metadata should include more than just data dictionary information

Stay tuned for more information on the book...

Posted April 25, 2008 12:18 PM
Permalink | No Comments |

Earlier this week I attended the MDM Insight event that TDWI ran in Savannah, GA. The hosted event employed a different model than other TDWI events, in which qualified participants were invited to attend, and vendor sponsors were provided with direct access to demonstrate their products' capabilities.

One of my roles at the event was to moderate a short workshop session to help attendees articulate what they believed were their most critical needs for master data management. One interesting common reaction was confusion about what composed an MDM solution, and what were the vendors actually selling. Another frequent reaction was expressing difficulty in lining up the requisite set of ducks within a reasonable amount of time to garner enough "horizontal support. Third, a general consensus was that instituting MDM was best done as an adjunct to existing application development (e.g. to support BI), focusing on small projects.

Actually, that last one confused me a bit, since if it only centering on a small application area (and not the whole enterprise), could it really be "master data" management?

Oh, one more thing - it may be worthwhile to consider the qualitative (and feasibility) differences between creating a "single golden source of truth" and an environment supporting the transparent access to a unified view of uniquely identifiable master objects (my current definition of what MDM is, by the way).

Posted March 5, 2008 12:45 PM
Permalink | No Comments |

Why do so many people directly link master data management with customer data? Maybe because we have been dealing with customer data so long, that when a new buzz word appears, we immediately try to link what we are doing to the "latest craze" to ensure our mindshare among the stakeholders.

However, the more I think about MDM and product data, the more intrigued I am. I have said this in a number of metings: product names are curious because they often describe what they are. For example, a PHILLIPS SCREWDRIVER 6-3/4" is a phillips screwdriver that is 6 and 3/4 inches long. What is more, product descriptions carry a lot of information that can be relatively easily parsed out using standard text analysis and text mining techniques. So I would very much be interested in hearing more about some product information MDM projects - email me or post your success stories!

Posted January 29, 2008 7:46 AM
Permalink | 1 Comment |

I have been thinking about MDM and the need to incorporate all data sets that describe a specific master object, and some of the issues surrounding supplied data. The appeal of mastering disparate data sets that represent the same conceptual data objects often leads to an enthusiasm for consolidation in which individuals may neglect to validate that data ownership issues will not impede the program. In fact, many organizations use data sourced from external parties to conduct their business operations, and that external data may appear to suitably match the same business data objects that are to be consolidated into the master repository.

However, there may be issues regarding ownership of the data and contractual obligations relating to the ways that the data is used, and these are some that might require some care:
• Licensing arrangements – data providers probably license the use of the data that is being provided, as opposed to “selling” the data for general use. This means that the data provider contract will be precise in detailing the ways that the data is licensed, such as for review by named individuals, for browsing and review purposes directly through provided software, or may be used for comparisons but may not be copied or stored. License restrictions will prevent consolidating the external data into the master.
• Usage restrictions – more precisely, some external data may be provided or shared for a particular business reason and may not be used for any other purpose. This differs subtly from the licensing restrictions in that many individuals may be allowed to see, use, or even copy the data, but only for the prescribed purpose. Therefore, using the data for any other purpose that would be enabled by MDM would violate the usage agreement.
• Segregation of information – in this situation, information provided to one business application must deliberately be quarantined from other business applications due to a “business-sensitive” nature, which also introduces complexity in terms of data consolidation.
• Obligations upon termination – typically, when the provider arrangement ends, the data customer is required to destroy all copies of provided data; if the provider data has been integrated into a master repository, to what degree does that co-mingling “infect” the master? This restriction would almost make it impossible to include external data in a master repository without introducing significant safeguards to identify data sources and to provide selective roll-back.

Posted November 20, 2007 2:34 PM
Permalink | 1 Comment |

Last week at the TDWI conference in orlando, I had the chance to briefly chat with Phillip Russom, who has assembled some very nice research papers this year on data quality and on master data management. One comment about his latest effort on MDM that I found intriguing was that his research suggested that a large number of MDM projects are done on behalf of finance activities, often in the area of accounting (GL, chart of accounts, item lists, etc.). I thought a large part of that data was what we might call "reference data," not necessarily "master data." One the one hand, it is good to see that the kinds of governance that are relevant for financial activities being applied to data.

However, his comment drives back to a question I must have heard 10 times down there - what is the difference between master data and reference data? This underlies an even more challenging question - how do you define "master data"? We have a lot of descriptions of master data, but nothing definitive. Dan Linstedt has done a good job of tracking some of these questions in his blog, but I think it is about time to nail this definition down. Any suggestions?

Posted November 13, 2006 8:09 AM
Permalink | 4 Comments |

AOL admits their goof in publishing huge amounts of search data that is questionably anonymized. The New York Times describes some details of the person identified through analysis of the released search data, as was reported in Martin McKeay's blog entry.

I always have a dual reaction to the uproar over the privacy issues associated with the release of this kind of data. First, I am amused that a big company like AOL doesn't have the governance controls in place to assess the public's reaction to the publication of what might be considered sensitive data. The second is surprise that "The Public" is concerned over the exposure of what they suddenly consider to be private information, when in fact the privacy policy states that the data may be presented to others in a nonidentifiable way ("(others) ...receive aggregate data about groups of AOL Network users, but do not receive information that personally identifies you"

Of course, AOL thought that the released data was presented in a way that did not personally identify anyone. The fact that others are able to extract identifiable information from presumably anonymized data should be a wake up call to AOL to review how their governance practices are deployed to ensure they are abiding by their own policies.

Posted August 10, 2006 5:50 AM
Permalink | 1 Comment |

I am actually writing this entry in real time during my (and Malcolm Chisholm's) DAMA/Meta Data Conference tutorial on Effective Management of Master Data. A question was asked about allowing updates of local copies of master data objects within operational applications. I immediately commented that allowing this introduces coherence issues between the application copies and the master copy, and that one must ensure that policies exist for coherence management if local updates are to be allowed. Of course, we have to realize that this issue is not a new one - it has been around for a long time, both in the data world (transactional semantics) as well as the compiler world (cache and memory coherence).

I would be surprised that there are any MDM systems that allow for local update without having some embedded transactional semantics incorporated.

Posted April 24, 2006 2:07 PM
Permalink | No Comments |