Blog: David Loshin Subscribe to this blog's RSS feed!

David Loshin

Welcome to my BeyeNETWORK Blog. This is going to be the place for us to exchange thoughts, ideas and opinions on all aspects of the information quality and data integration world. I intend this to be a forum for discussing changes in the industry, as well as how external forces influence the way we treat our information asset. The value of the blog will be greatly enhanced by your participation! I intend to introduce controversial topics here, and I fully expect that reader input will "spice it up." Here we will share ideas, vendor and client updates, problems, questions and, most importantly, your reactions. So keep coming back each week to see what is new on our Blog!

About the author >

David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management: The Data Quality Approach and Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information. David can be reached at or at (301) 754-6350.

Editor's Note: More articles and resources are available in David's BeyeNETWORK Expert Channel. Be sure to visit today!

May 2007 Archives

I had a conversation the other day with one of my former colleagues, and I asked him his opinion about whether approximate matching and semantic techniques would be integrated into search engines. His response surprised me: he told me that he had read that over 90% of google searches involve a single word, and that in the absence of information, the engine didn't have that much to work with. Therefore, was it really worth it to add this increased functionality if, for the most part, it would add computation time but only benefit a small number of searchers?

That, of course, shocked me, but maybe it shouldn't have. I thought I was pretty good at googling, mostly because I was able to get pretty good results as a by-product of the feedback I get from each search. For example, you start with a phrase in quotes, and that may be sufficient. If not, you can scan the short results coming back to seek out better phrases to include (or exclude) from the search. Others are much more comprehensive in their searching, using qualifiers and key tokens to enhance their search (e.g., Johnny Long, who will be a keynote speaker at the upcoming Data Governance conference in San Francisco, at which I will also be speaking, by the way).

But perhaps the general computer user is not so sophisticated, and may need some suggestions. Anyone want to contribute their favorite search strategies?

Posted May 25, 2007 6:33 AM
Permalink | No Comments |

This past week I attended TDWI, and was lucky enough to conduct a podcast interview with Philip Russom, TDWI's Senior Manager of Research and Services. In our conversation, we hit upon a number of interesting topics regarding upcoming trends in business intelligence, but most interesting was his opinion about the integration of search technology into BI in a way that is likely to change the way we think about discovering actionable knowledge.

Posted May 18, 2007 6:21 AM
Permalink | No Comments |

My company has been involved in a lot of data governance work recently. Two of the mian drivers are regulatory compliance and consistency in reporting (which often rolls back to compliance). Interestingly, in some of the client industries, fraud detection seems to be an additional driver. This is a little curious to me. On the one hand, fraud detection fits into the compliance framework - looking for non-conformance to business policies. In both cases, we essentially identify critical policies, rules that indicate conformance to those policies, and generate alerts when those policies are violated.

The difference is that compliance is introspective while fraud detection is outward looking. Compliance seeks to guard your own behavior, looking for how the organization is living up to everyone else's expectations. Fraud detection is outwardlooking, seeking to figure out how your own rules are being transgressed by others.

I can imagine another significant difference - fraud is performed proactively, with the perpetrators intentionally trying to avoid detection. Compliance issues are potentially intentional, but inadvertent non-compliance is certainly targeted by control processes.

This raises a different business challenge: it may be possible that there are corporate business policies that conflict with externally-imposed regulations. If so, does the issue of compliance change from self-policing to weighing the risk of noncomplaince with the risk of getting caught? And if the latter is the case, it suggests that internal governance programs are "window-dressing," especially when the real (i.e., intentional) transgressions are going to be well-hidden.

Posted May 13, 2007 5:46 PM
Permalink | No Comments |

I will be at The Data Warehousing Instititue's world conference next week in Boston, MA between May 14 and May 16. I will be teaching two courses, one on data quality management and a new one on Network and Link Analysis, which I hope will be thought-provoking. I also plan to do some B-Eye podcasting on Tuesday afternoon, and general hobnobbing and hanging out to hear about what attendees are interested in these days.

If you plan to be there, drop me an email and let's try to schedule time to say hi!

Posted May 8, 2007 7:39 AM
Permalink | No Comments |

Last year, I took a look at some open source BI tools being marketed by some promising companies, and was confounded by the "minimalist" approach to documentation and support. In one instance, a support question was directed back to the company's online forum ("Do a search, you'll find the answer."), but without any indexing or FAQ, I increasingly grew frustrated. Now, I don't consider myself a dumb guy, and have even done a bit of programming and installations in my day, so the conclusion I drew was that if I couldn't get it running relatively quickly, there might be an opportunity for the company's improvement of its documentation.

Last night, I had the opportunity to chat with my colleague Mark Madsen while hanging out at Informatica World, and this same topic came up. Apparently, my experience was not unique. I really think, though, that this is a significant problem, since open source is often used to lower the barrier to entry for introducing new ideas into certain kinds of organizations. I hope that over time, those companies trying to run businesses out of open source BI tools can emulate the Linux/Apache model and make it easy to build an open source BI solution.

Posted May 3, 2007 10:31 AM
Permalink | 1 Comment |