Blog: Ronald Damhof Subscribe to this blog's RSS feed!

Ronald Damhof

I have been a BI/DW practitioner for more than 15 years. In the last few years, I have become increasingly annoyed - even frustrated - by the lack of (scientific) rigor in the field of data warehousing and business intelligence. It is not uncommon for the knowledge worker to be disillusioned by the promise of business intelligence and data warehousing because vendors and consulting organizations create their "own" frameworks, definitions, super-duper tools etc.

What the field needs is more connectedness (grounding and objectivity) to the scientific community. The scientific community needs to realize the importance of increasing their level of relevance to the practice of technology.

For the next few years, I have decided to attempt to build a solid bridge between science and technology practitioners. As a dissertation student at the University of Groningen in the Netherlands, I hope to discover ways to accomplish this. With this blog I hope to share some of the things I learn in my search and begin discussions on this topic within the international community.

Your feedback is important to me. Please let me know what you think. My email address is Ronald.damhof@prudenza.nl.

About the author >

Ronald Damhof is an information management practitioner with more than 15 years of international experience in the field.

His areas of focus include:

  1. Data management, including data quality, data governance and data warehousing;
  2. Enterprise architectural principles;
  3. Exploiting data to its maximum potential for decision support.
Ronald is an Information Quality Certified Professional (International Association for Information and Data Quality one of the first 20 to pass this prestigious exam), Certified Data Vault Grandmaster (only person in the world to have this level of certification), and a Certified Scrum Master. He is a strong advocate of agile and lean principles and practices (e.g., Scrum). You can reach him at +31 6 269 671 84, through his website at http://www.prudenza.nl/ or via email at ronald.damhof@prudenza.nl.

It is only by means of good and respectfull discussion that knowledge and insight will evolve. This post should be regarded as such.

This post is a second reaction to the first article in a series of three which were written by a highly respectfull thoughtleader in the field and publisher on the B-Eye-Network; Rick van der Lans. The papers are titled 'The Flaws of the Classic Data Warehouse Architecture'.

This blog post is a reaction to the first part. It deals with the flaws of the classic data warehouse architecture (CDWA).

Rick signals five flaws which will lead in article two and three to a new architecture. This post is addressing the second flaw.

- My reaction to flaw #1 can be read here.

Flaw 2 according to Rick
The CDWA stores a lot of redundant data. The more redundant the data, the less flexible the architecture is. We could simplify our data warehouse architectures considerably by getting rid of most of the redundant data. Hopefuly, the new database technology on the market, such as data warehouse appliances and column-based database technologies, will decrease the need to store so much redundant data. Rick commented on this flaw in his closing keynote statement on a BI event we had last week, stating basically that the DWH professional did an extremely lousy job last decades in building these redundancy monsters. Like in his article he strengthened this argument by research done by Nigel Pendse claiming that the average BI application only needed a fraction of the stored (redundant) data. 

My reaction to flaw 2
First of all, I agree that new technologies can limit the volume of redundant data considerably.

But to say that in the last decades the data warehouse professional did an etremely lousy job because of the huge redundancy they created in their data warehouses...well, that's just plain stupid and for the people that are applauding this statement I would like to say; 'I bet you never actually build a data warehouse'.

BI populism.....thats what it is.

As for the flexibility argument; more redundant data kills flexibility. Hmm...it's a bit of a bs-argument. Because flexibility is not only affected by redundant data. If I had build my data warehouses in the last decades without redundant data I would have ended up with huge complex transformation rules and a big strain on processing capacity. Both issues woud have killed the flexibility big time and I am leaving aside the degradation of performance, degradation in ease of use, degradation in maintainability and the degradation of the testability of the system. But I agree - I would not have redundant data...I would not have any quality of service either....but who cares.

BI populism.....thats what it is.

But is the CDWA architecture flawed by this redundancy problem? I do not think so at all. We would still need a datastore of some kind (Rick seems to acknowledge that by advocating the use of appliances), we would still have several layers after this datastore, preparing the data for several different functionalities (reporting, mining, advanced analytics, datasharing to third parties, etc.). Let's take the datamart layer, will it dissapear? I don't think so. The question is whether it needs to be materialized. And that's where new technology will be extremely valuable. It seems that Rick is translating the word 'Architecture' with 'Technical Architectue' as a 1:1 relationship.

The hub-spoke architecture of the CDWA model is still extremely valid. Off course, technology within this architecture will evolve and will enable us to deliver an even better quality of service.


Posted June 14, 2009 3:57 AM
Permalink | 1 Comment |

1 Comment

My site, Minutekiller showcases some very funny videos. Check out "the Backup" a shotgun holder for your bed. Pretty awesome stuff.

Leave a comment