Single version of the truth: data warehousing mantra
During the past 20 years, organizations have approached reporting and business intelligence in a number of ways with widely varying degrees of success. One decidedly determining success factor has been the ability to establish a single version of the truth (nothing short of a mantra to the serious practitioners) to ensure data accuracy and consistency.
To federate RDB data from multiple systems, users have built data marts and data warehouses to unify the view. ETL processes are used to convert data represented in different ways into a standard representation using a fixed data schema.
Using a relational database, data is stored in tables as rows and columns, using multiple tables and primary and foreign keys to create relationships between data. For example, a one-to-one relationship (one employee, one phone number) can be represented by placing both pieces of data in a single table. A one-to-many relationship (one employee, multiple phone numbers) uses two tables, many-to-many uses three tables.
Single version of the truth: challenge in today’s complex information environment
Thus, with relational database technology, the physical structure is used to represent the relationships in the data. But what if the relationships change? What happens when the phone number column in the employee record needs to be expanded to include both office and cell phone numbers? The solution is to create a new table that stores the phone type and phone number data. A foreign key links the phone numbers in the phone number table with the employee in the employee table. This means changing the one-to-one relationship to a one-to-many relationship, which means restructuring the data.
Moreover, today’s organization committed to competing on analytics must support many different stakeholders whose required views are likely to be different. Some will be doing data discovery, using data in unpredictable ways. So, is maintaining a single version of the truth possible, or even desirable? There are some who believe that maintaining a single version of the truth gets in the way of valuable predictive analytics. It’s at best a futile exercise, at worst, potentially harmful to data discovery.
In today’s complex analytic environment with data coming from more sources in a wider range of formats, and with more users relying on more varied types of data analysis, is it:
– Still important to provide a single version of the truth?
– No longer practical to provide a single version of the truth?
– Possible to maintain data consistency without a single version of the truth?
I would argue that rigid adherence to a single data schema is neither practical nor desirable. But that doesn’t have to mean abandonment of all attempts to maintain data consistency. Today’s analytic competitor has a need for not just one view, but multiple consistent interpretations of business data by all the different stakeholders. That includes human and computer based users, applications, processes and services which need to use (and reuse) data within their own contexts. Enabling multiple consistent interpretations of complex data via semantic reconciliation of data schemas, metadata, business vocabulary and policies is both possible and desirable.
Semantic technologies are emerging as a means for organizations to evolve from providing a single version of the truth to enabling multiple consistent interpretations of business data by all the different stakeholders with appropriate context. They offer a solution for applications whose data structures need to change or be extended on a regular basis. Or where the data definition must provide meaning and context so that a computer can understand the information without requiring a human to interpret it. Semantic technologies not only reconcile data definitions and business rules, but provide a more flexible way to represent relationships between data elements.
The need to analyze data in complex information environments will drive increased use of semantic technologies. In fact, Gartner identified semantic technologies as a top technology trend impacting information infrastructure in 2013; one which, along with things like big data, in-memory computing and NoSQL DBMSs, will play a key role in modernizing information management.
What do you think?
In subsequent blog posts I’ll discuss the definition and examples of semantic technologies, along with popular use cases and applications. Feel free to contribute input, questions, and discussion topics. E-mail me at [email protected].