Data comes to fruition when an employee puts the data that is generated by a process in an application, and the data finds its way to the underlying database. But these databases are not well suited to use for data analysis.
These databases usually have a relational data model. This type of model was invented to process bank transactions. It is very good in processing large amounts of transactions, but not in generating insights in the data. There are different reasons why this is the case.
Tables in a relational model can take different roles. A table 'relations' can contain suppliers as well as clients. But in our analysis we want to know for example how profitable a certain group of clients is, so we need to split up this table.
The evolution in time between the different measuring points in a process can bring a lot of insight, but the application may not always store these historical data points because they are not needed to go through the process.
It can very well be that you use different applications to help your customer. For example, one to make orders and another one to send invoices. This order-to-cash (OTC) is an important process to track, but to make this analysis possible you have to combine data from one application with data from another.
Some of these things you could solve in the code of your data analysis or in the front end of your data visualization tool, but this brings certain risks with regard to maintainability and scalability. Moreover, the more complex challenges, that bright the most value for the business, don't belong in these tools. That's why often an extra step is added where the necessary transformations can be handled to efficiently and correctly analyze and visualize the data.
The transformation to go from a relational model in an application database to a model that is better suited for analysis is done in a data warehouse (DWH). The data model that is used here is a dimensional model or a data vault model.
A data warehouse contains data about a certain subject relevant to the business (subject-oriented). It integrates data from different sources (integration) and keeps track of historical data (time-variant). As soon as data is written to the datawarehouse, the data can no longer be changed (nonvolatile). By facilitating data analysis and visualization the data warehouse supports the decision making process in an organization.
Despite the many innovations and developments in the field of data analysis, the need to harmonize and integrate data coming from different applications within a specific context - and thus the datawarehouse - will remain important for a while.
We are aware of the importance to capture and use unstructured data and network data that is stored in data lakes and that is best represented by NOSQL and Graph databases. For the sake of simplicity we make abstraction of this type of data for now. Let's get our small data in order first, before taking on our big data.