Blog: David Loshin Subscribe to this blog's RSS feed!

David Loshin

Welcome to my BeyeNETWORK Blog. This is going to be the place for us to exchange thoughts, ideas and opinions on all aspects of the information quality and data integration world. I intend this to be a forum for discussing changes in the industry, as well as how external forces influence the way we treat our information asset. The value of the blog will be greatly enhanced by your participation! I intend to introduce controversial topics here, and I fully expect that reader input will "spice it up." Here we will share ideas, vendor and client updates, problems, questions and, most importantly, your reactions. So keep coming back each week to see what is new on our Blog!

About the author >

David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management: The Data Quality Approach and Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information. David can be reached at or at (301) 754-6350.

Editor's Note: More articles and resources are available in David's BeyeNETWORK Expert Channel. Be sure to visit today!

One good thing about being busy is that you get opportunities to streamline ideas through iteration. My interest in data profiling goes pretty far back, and the profiling process is one that is useful in a number of different usage scenarios. One of these is data quality assessment, especially in situations where not much is known about the data; profiling provides some insight into basic issues with the data.

But in situations where there is some business context regarding the data under consideration, undirected data profiling may not provide the level of focus that is needed. Providing reports on numerous nulls, outliers, duplicates, etc. may be overkill when the analyst already knows which data elements are relevant and which ones are not. In these kinds of situations, the analyst can instead concentrate on the statistical details associated with the critical data elements as a way to evaluate the extent to which data anomalies might impact the business.

So in some recent client interactions, instead of just throwing the data into the profiler and hoping that something good comes out, we narrowed the focus to just a handful of data elements and increased the scrutiny on the profiler results, sometimes refining the data sets, pulling different samples, segmenting the data to be profiled, joined different data sets prior to profiling, all as a way to get more insight into the data instead of the typical reports telling me that yet another irrelevant data element is 99% null. The upshot is that a carefully planned process for driving the directed profiling process gave much more interesting results, both for us and for the client.

Posted December 23, 2008 1:51 PM
Permalink | No Comments |

Leave a comment


Search this blog
Categories ›
Archives ›
Recent Entries ›