Blog: David Loshin Subscribe to this blog's RSS feed!

David Loshin

Welcome to my BeyeNETWORK Blog. This is going to be the place for us to exchange thoughts, ideas and opinions on all aspects of the information quality and data integration world. I intend this to be a forum for discussing changes in the industry, as well as how external forces influence the way we treat our information asset. The value of the blog will be greatly enhanced by your participation! I intend to introduce controversial topics here, and I fully expect that reader input will "spice it up." Here we will share ideas, vendor and client updates, problems, questions and, most importantly, your reactions. So keep coming back each week to see what is new on our Blog!

About the author >

David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management: The Data Quality Approachand Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information. David can be reached at or at (301) 754-6350.

Editor's Note: More articles and resources are available in David's BeyeNETWORK Expert Channel. Be sure to visit today!

November 2007 Archives

I will be speaking at a live breakfast event on Data Quality and Identity Resolution (aka record linkage, matching, deduping,...) in Washington, DC on Dec 11 - if you are interested in the topic, are in the area, please register and attend! It is sponsored by Identity Systems.

Posted November 29, 2007 8:22 AM
Permalink | No Comments |

I have been thinking about MDM and the need to incorporate all data sets that describe a specific master object, and some of the issues surrounding supplied data. The appeal of mastering disparate data sets that represent the same conceptual data objects often leads to an enthusiasm for consolidation in which individuals may neglect to validate that data ownership issues will not impede the program. In fact, many organizations use data sourced from external parties to conduct their business operations, and that external data may appear to suitably match the same business data objects that are to be consolidated into the master repository.

However, there may be issues regarding ownership of the data and contractual obligations relating to the ways that the data is used, and these are some that might require some care:
• Licensing arrangements – data providers probably license the use of the data that is being provided, as opposed to “selling” the data for general use. This means that the data provider contract will be precise in detailing the ways that the data is licensed, such as for review by named individuals, for browsing and review purposes directly through provided software, or may be used for comparisons but may not be copied or stored. License restrictions will prevent consolidating the external data into the master.
• Usage restrictions – more precisely, some external data may be provided or shared for a particular business reason and may not be used for any other purpose. This differs subtly from the licensing restrictions in that many individuals may be allowed to see, use, or even copy the data, but only for the prescribed purpose. Therefore, using the data for any other purpose that would be enabled by MDM would violate the usage agreement.
• Segregation of information – in this situation, information provided to one business application must deliberately be quarantined from other business applications due to a “business-sensitive” nature, which also introduces complexity in terms of data consolidation.
• Obligations upon termination – typically, when the provider arrangement ends, the data customer is required to destroy all copies of provided data; if the provider data has been integrated into a master repository, to what degree does that co-mingling “infect” the master? This restriction would almost make it impossible to include external data in a master repository without introducing significant safeguards to identify data sources and to provide selective roll-back.

Posted November 20, 2007 2:34 PM
Permalink | 1 Comment |

Surely, you could not have been surprised to hear that IBM is buying Cognos. After months of transactions in which business intelligence vendors buy component vendors, only to be purchased by larger vendors as part of "industrial" demand-information programs, it looks like most, if not all of the major BI suite vendors are now absorbed (Cognos, Hyperion, Business Objects, as well as others such as Microsoft's acquisition of ProClarity). This does leave a few morsels left at the table, namely Microstrategy and Information Builders, although whether either is available or up for grabs is just grist for the rumor mill.

These announcements are always somewhat disruptive to the industry, since they shake up expectations about existing alliances and partnerships, and raises the question of whether a solution needs to be a monolithic, stacked one. So here is a quick thought: Large-scale (and pervasive) acquisitions are good for the industry, and are a little like forest fires in that they clear the way for smaller, innovative start-ups to create new tools to fill the void. We might expect that by next year at this time, we will see some interesting vendor offerings that can satisfy the growing market need.


Posted November 12, 2007 7:10 AM
Permalink | 2 Comments |

With the proliferation of portable GPS devices, embedded within boxes that you install in your car, boat, airplane, or even carry around attached to your belt in the form of a cell phone, it makes me wonder about the growing need to more tightly integrate location data into our business intelligence projects. Actually, I am thinking more about two aspects of BI, the first being the inclusion of location data into the collection and analysis arena, and the other is more generic - embedding predictive analytics into operational systems.
Let's consider the most accessible operational system, the automobile GPS system. Currently, for the most part it looks like a relatively dumb device – no offense intended, of course – but the focus is identify current geographic location, and assist a driver to get from the current location to a desired target location. The device may capture history (“my favorite locations,” “my recent locations”), and may have embedded point of interest data sets that help you find the nearest fast food joint or gas station, no matter where you are. But the objective of the device is classically operational – get me from here to there.
But think about all the interesting information that can be gathered from an automobile GPS device. Reduce all inhabitants of the car into a single household unit, then look at where that unit travels to (based on target addresses), how long it takes to get there (actual drive times), whether the driver is an aggressive or passive driver (is the elapsed time greater than or less than the predicted time?), how long between stops (how frequently new locations are entered from target locations), general interests (point of interest lookups), social networks (residence-to-residence trips), whether the car is being used for business (residence-to-business or business-to-business trips), oh boy, I could go on for week son this topic.
What prevents this knowledge from being captured? Probably a few things. First, the communication link between the service provider is unidirectional – data comes from the satellite to the unit, but I do not think that it goes the other way. Second, the devices were engineered to solve the operational problem and therefore there is no inherent design for information capture (in other words – no transaction database). Third, because of the operational design, there is no likely to be no centralized repository to collect this information. And so on, I am sure you guys can also come up with a bunch of additional reasons.
However, consider that there is already consolidation in the industry with recent announcement over the summer. One is the emerging bidding war between Tom Tom and Garmin (both GPS device manufacturers) for geographic data vendor TeleAtlas. Even more interesting is the upcoming acquisition of geographic data vendor Navteq by Cell phone manufacturer Nokia. A side note to that last one: Nokia, through its purchase of Intellisync, also owns identity resolution tool company Identity Systems. Hmm, maybe there is potential for this idea after all…

Posted November 8, 2007 6:58 AM
Permalink | No Comments |