Originally published 26 April 2012
Practically all modern business applications that employ a data subsystem represent their data sets using data models. A data model is a discrete structured data representation of a real-world set of entities related to one another. Each entity (most often represented using a table) carries a set of characteristic attributes (described as data elements). Yet over time, our understanding of the ways that real-world objects and events are captured within a structured representation must adapt to the context in which the data sets are used and the ways the information satisfies the needs of the business processes.
Transaction processing systems need to have visibility into all aspects of a limited number of data instances at a time (such as a single customer buying a particular product). Analytical systems may need to support rapid aggregation to quickly respond to ad hoc queries relating to many data instances (such as “How many customers bought this specific product?”). In other words, there is bound to be a significant difference between how we use data in an operational/tactical manner (i.e., to “run the business”) and the ways we use data in a strategic manner (i.e., to “improve the business”).
The traditional modeling technique for operational systems revolves around the entity-relationship (E/R) model. Unfortunately, reporting and analytical applications are generally less well-suited to utilizing data structured in the entity-relational form (although there are emerging environments that make this less true). The alternative is to restructure transaction or event data using what is called “dimensional modeling” that is better organized to provide rapid responses to different types of queries and analyses.
The challenge in using the standard entity-relationship model for reporting and analysis lies in the interconnectedness of the entities and the corresponding complexity in accumulating the information necessary for hierarchical aggregations. The alternative dimensional modeling technique captures the basic unit of representation as a single multi-keyed entry in a slender fact table, with each key exploiting the relational model to refer to the different dimensions associated with those facts.
A maintained table of facts, each of which is related to a set of dimensions, is a much more efficient representation for data in a data warehouse and allows for information to be represented in a way that is more suitable to high-performance access. This is due to the ability to efficiently create aggregations and extractions of data specific to particular dimensional constraints quickly while being able to aggregate information.
|Time||Customer||Quantity||Item||Clerk|| Unit Price ||Total||Promotion||Location|
SOURCE: Data Models for Analytics
Recent articles by David Loshin