Listen carefully to the “big data” vendors and this is what you hear: “Let’s get rid of relational.” It is like courtiers in the castle whispering, “The king must die.” What’s going on here?
It is true that relational technology has been around for a long time and that it has two weaknesses that the big data vendors want to exploit:
- Relational technology does not handle text all that well
- Relational technology does not handle mammoth volumes of data all that well
So the big data vendors have something new and fun – let’s call it “Big Data DBMS.”
Now there is no question that “Big Data DBMSs” can handle volumes of data – look at Google if you don’t believe me. And it is true that “Big Data DBMSs” can handle text – that is essentially all they have. So big data is here to stay.
Let’s take a realistic look at what is happening here. It is true that “Big Data DBMSs” can handle huge amounts of data and can handle textual data. But – for a variety of reasons – the text found in “Big Data DBMSs” is not fit for analytical processing. “Big Data DBMSs” are optimized for collection and storage of lots of data. But when it comes time to analyze the text found in “Big Data DBMSs,” it is a different story entirely.
Why is data found in “Big Data DBMSs” not fit for analysis? There are several reasons this data is so difficult to analyze:
- There is no user-friendly selection tool for rummaging around “Big Data DBMSs.” MapReduce is hardly user friendly.
- The physical dimensions of the text found in “Big Data DBMSs” are not uniform. This has always been a problem for the processing of multiple electronic records.
- Before text can be used for analysis, it must be disambiguated. It is potentially dangerous to use raw text for decision-making purposes.
And there are undoubtedly other reasons why text found in “Big Data DBMSs” is hard to analyze. The most serious of these objections is that of the need to disambiguate text before analysis can be done. Merely taking raw text and using it for analysis can produce very inaccurate and very misleading results. In a word, the context of text must be established before text can be used for analysis.
In order to use the text for decision-making purposes, you need to add context to the text. That is what is meant by disambiguation, and disambiguation is necessary for ANY text found anywhere, which certainly includes text found in big data. To simply jump into big data and start to use the raw text that is found there for analysis is to invite disaster.
disambiguates text. Textual ETL reads the text found in big data and refines it. Textual ETL then puts its output into a relational database. Once in a relational database, text can be openly used by the industry or corporate analyst.
An interesting question is this: Does textual ETL have to place its results in a relational database? Of course not. Textual ETL can place its data anywhere. If the corporation wishes, textual ETL can place the refined data back into the big data database.
The reason why textual ETL places the data in a relational data base today is that is where the corporate business analysis takes place. But in the future, after textual ETL has disambiguated the raw data
found in big data, there is no reason why textual ETL cannot place the refined text back into the big data environment, if that is what the corporation wishes. Textual ETL is agnostic.
But in any case, before text can be used for analysis, the text must be disambiguated.
SOURCE: Big Data, Text and Relational Database Management Systems
Recent articles by Bill Inmon