In the previous articles in this series (see Part 1 and Part 2), we have seen that corporate data can be divided into three classes – structured, unstructured repetitive and unstructured nonrepetitive data. Each of these environments has their own characteristics when it comes to deriving business value from the environment.
First we begin with the structured environment. There are many ways to derive business value from the structured environment. There are online systems, where an individual unit of data can be accessed. There is business intelligence, where reports can be created and entire sets of data can be accessed and analyzed. There is a whole industry that has grown up around accessing and analyzing structured data.
In a sense, it is easiest to derive business value from structured data because each unit of data has its own demarcation and its own context. There is metadata of many varieties in the world of structured data. Indeed it is possible that structured data is structured because of its metadata infrastructure. In structured data there are records, attributes, indexes, keys and so forth. There is a lot of metadata to guide the business analyst to what data is where when it comes time to derive business value from structured data. Because of the careful construction of the data and the infrastructure surrounding the structured data, deriving business value is a relatively easy thing to do when it comes to structured data.
Unstructured Repetitive Data
Now let’s consider unstructured repetitive data. Deriving business value from unstructured repetitive data is a lot different than deriving business value from structured data. The first difference is that relatively little data in unstructured repetitive data has business value. Depending on the particulars of the data perhaps less that .001% of the data in unstructured repetitive data has business value. So finding business value here is more of a process of filtering and winnowing data than it is of looking for lots of different types of data.
The process of finding business value in the unstructured repetitive environment is usually done by means of a search engine. With a search engine, it is easy to qualify data based on very basic characteristics of data. The fact that a search engine is used is fortunate because the qualification of data is quite sparse in the unstructured repetitive sector of corporate data.
In the unstructured repetitive sector of corporate information, there is typically scant information about the data to be analyzed. Unlike the structured sector of corporate data, in the unstructured repetitive environment there is little or no context or other description of the data. There are simple qualifications such as record size, date and time the record was created, and perhaps a few other measurements of data. But there is nothing like the rich metadata infrastructure that is found in the structured environment. Because of the sparsity of the infrastructure of data found in the unstructured repetitive environment, looking into the data and qualifying the data on anything but the most basic of information is not a possibility. Instead the business analyst must be satisfied at looking at the data with only a meager set of descriptions of the data. So it is fortunate that business value is achieved through a filtering and qualification process in the unstructured repetitive environment.
In the next article in this series, we will look at achieving business value in the unstructured nonrepetitive environment.
SOURCE: Characteristics of Corporate Data Types
Recent articles by Bill Inmon