Blog: David Loshin Subscribe to this blog's RSS feed!

David Loshin

Welcome to my BeyeNETWORK Blog. This is going to be the place for us to exchange thoughts, ideas and opinions on all aspects of the information quality and data integration world. I intend this to be a forum for discussing changes in the industry, as well as how external forces influence the way we treat our information asset. The value of the blog will be greatly enhanced by your participation! I intend to introduce controversial topics here, and I fully expect that reader input will "spice it up." Here we will share ideas, vendor and client updates, problems, questions and, most importantly, your reactions. So keep coming back each week to see what is new on our Blog!

About the author >

David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management: The Data Quality Approachand Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information. David can be reached at or at (301) 754-6350.

Editor's Note: More articles and resources are available in David's BeyeNETWORK Expert Channel. Be sure to visit today!

January 2008 Archives

Why do so many people directly link master data management with customer data? Maybe because we have been dealing with customer data so long, that when a new buzz word appears, we immediately try to link what we are doing to the "latest craze" to ensure our mindshare among the stakeholders.

However, the more I think about MDM and product data, the more intrigued I am. I have said this in a number of metings: product names are curious because they often describe what they are. For example, a PHILLIPS SCREWDRIVER 6-3/4" is a phillips screwdriver that is 6 and 3/4 inches long. What is more, product descriptions carry a lot of information that can be relatively easily parsed out using standard text analysis and text mining techniques. So I would very much be interested in hearing more about some product information MDM projects - email me or post your success stories!

Posted January 29, 2008 7:46 AM
Permalink | 1 Comment |

Today is the 50th anniversary of the lego block, and an interesting side note is that Lego's Mindstorm product line is one of the few commercial successes of the Logo programming language.

Posted January 28, 2008 5:59 AM
Permalink | No Comments |

I got a postcard from Verizon today. It said:

"We recently sent you a letter in which we advertised a Verizon bundle package of Verizon FiOS Internet and Verizon FiOS TV service. This letter was mailed by mistake and the services described in the letter have never been offered by Verizon under those terms.

We apologize for this error.

Verizon Consumer Marketing"

OK, seriously, I am finding it hard to get my head around this. The offer came in one of those pseudo-overnight envelopes that marketers often use to make their letter seem more credible - you know, cardboard weight with a zip-pull - not cheap. So this company:

- Drafts a marketing letter,
- Prints tens of thousands of copies,
- Custom prints tens of thousands of fancy cardboard envelopes,
- Puts them into fancy cardboard envelopes, and
- Mails them.

Actually, I am guessing about the number - it could be orders of magnitude greater, for all I know.

I find it hard to believe that the internal governance and control over marketing would not have stopped the process after the marketing letter had been drafted if it contained erroneous information, so I am curious as to what has really happened. I mean, in fact, having sent out the previous letter, the company actually did offer the services under those terms, but perhaps, due to some error, was not prepared to honor that offer.

In any event, my guess would be that there were some significant negative business impacts related to this bundle blunder - actual hard costs for materials and postage, as well as softer costs relating to organizational trust.

Posted January 24, 2008 1:35 PM
Permalink | No Comments |

There is an oft-quoted statistic about the growth rate of data volumes that I wanted to use in some context, and I started searching for a source. I googled "data volumes" +"double every" to see what I could find, and to my surprise, lots of hits, but it is difficult to pin down the exact parameters. Lots of folks are using the statistic:

"Data doubles every year"
"The amount of stored data from corporations nearly doubles every year"
"...the amount of data stored by businesses doubles every year to 18 months."
"In his book “Simplicity,” business management expert and author Bill Jensen indicates that the most conservative estimates show business information doubling every three years, while some estimates say data doubles every year. "
"Unstructured data doubles every three months"

I am still following links from the first page of results, and we are doubling our data every 3 to 18 months.

"Reed's Law states that the volume of data doubles every 12 months. "

OK, so there is actually a law about it. Hold on a second, according to wikipedia this law is about the utility of (social) networks, so perhaps the law doesn't apply in all jurisdictions.

Anyway, these may all be references to a UC Berkeley study on the growth of data , which said that the amount of information stored on media such as hard disk drives doubled between 2000 and 2003.

So let's look at this a little more carefully - we have a scientific study that looks not at the creation of data, but rather the use of storage media to hold what is out there. And out there is a lot of stuff needing a lot of storage, like images, music, videos, etc. Things that have information yet from which are still a challenge to extract data. Also, consider that for each thing out there, there are likely to be a lot of copies! I am sure that a scan of all the TiVos in the country would demonstrate that lots of people are still catching up on older episodes of 24 and American Idol.

I need to refine my question a little bit, then, but I am afraid it will be difficult to track down defensible sources for it. I am more interested in knowing about the growth rate for data that can be integrated into an actionable information environment. I may not care about the bits comprising that specific episode of 24 that is sitting on millions of DVRs, but as an advertiser, I might be interested in profiling which households have watched which episodes and at what kind of time shift.

Anyone have any ideas?

Posted January 23, 2008 10:48 AM
Permalink | No Comments |

I will be at the Data Warehousing Institute world conference in Las Vegas on Feb 19-21. If you are attending and would like to schedule a conversation, please contact me at Looking forward to seeing you there!

Posted January 22, 2008 10:43 AM
Permalink | No Comments |