Subscribe to the newsletter!

When do you need a data warehouse?

The transformations to mold the data from an application in a form that is better suited for data analysis is done in a datawarehouse.

Data comes to fruition when an employee puts the data that is generated by a process in an application, and the data finds its way to the underlying database. But these databases are not well suited to use for data analysis.

Relational model

These databases usually have a relational data model. This type of model was invented to process bank transactions. It is very good in processing large amounts of transactions, but not in generating insights in the data. There are different reasons why this is the case.

Challenges of the relation model

Tables in a relational model can take different roles. A table 'relations' can contain suppliers as well as clients. But in our analysis we want to know for example how profitable a certain group of clients is, so we need to split up this table.

The evolution in time between the different measuring points in a process can bring a lot of insight, but the application may not always store these historical data points because they are not needed to go through the process. 

It can very well be that you use different applications to help your customer. For example, one to make orders and another one to send invoices. This order-to-cash (OTC) is an important process to track, but to make this analysis possible you have to combine data from one application with data from another.

Some of these things you could solve in the code of your data analysis or in the front end of your data visualization tool, but this brings certain risks with regard to maintainability and scalability. Moreover, the more complex challenges, that bright the most value for the business, don't belong in these tools. That's why often an extra step is added where the necessary transformations can be handled to efficiently and correctly analyze and visualize the data.

Dimensional model

The transformation to go from a relational model in an application database to a model that is better suited for analysis is done in a data warehouse (DWH). The data model that is used here is a dimensional model or a data vault model.

Properties of a data warehouse

A data warehouse contains data about a certain subject relevant to the business (subject-oriented). It integrates data from different sources (integration) and keeps track of historical data (time-variant). As soon as data is written to the datawarehouse, the data can no longer be changed (nonvolatile). By facilitating data analysis and visualization the data warehouse supports the decision making process in an organization.

Here to stay

Despite the many innovations and developments in the field of data analysis, the need to harmonize and integrate data coming from different applications within a specific context - and thus the datawarehouse - will remain important for a while.

Disclaimer

We are aware of the importance to capture and use unstructured data and network data that is stored in data lakes and that is best represented by NOSQL and Graph databases. For the sake of simplicity we make abstraction of this type of data for now. Let's get our small data in order first, before taking on our big data.