How use data engineering to process the results of a data maturity assessment

Defining what you mean when you are talking about data can be tricky. Especially when there is a philosopher in the room. That's why I use the following working definition of data in a business context:

Data comes to fruition when a software application captures and stores the signals that the execution of a business process generates.

Now we can use this data to analyze what is going on in the business. We can define what numbers that are relevant to follow up on our business strategy. We can define measuring points in the process to get these numbers. We can plot the evolution of these well chosen numbers in intuitive dashboards. Based on these dashboards we can take actions to refine the business process and the underlying software application. To see the impact on the numbers of these changes.

And so on. Until we have a high performing business that hits it's targets, realizes it's strategy and fulfills it's mission.

But first, we need to get people on board of this ambitious vision. We need to get people on the same page. We need a way to talk about this.

That's why I developed a data maturity assessment. When I talk to people it's clear they have very different idea's, aspirations and fears about data projects. Stereo-typically you could say the IT guys focus on tooling and security. The sales guy wants something flashy. The HR people something simple. The CFO cares about total cost of ownership. And the CEO fears to miss out on the opportunities that more advanced data analytics bring.

The assessment around 3 topics gauges the maturity around 3 topics: data culture, data strategy and data architecture. If you like you can take it here. Here I focus on the data engineering that comes into play processing the results. All software applications and databases that I mention here below are opensource. They are all self-hosted on Hetzner cloud servers with Ubuntu 20.04 operation system.

1. The business process in this case is the assessment of the data maturity. The main goal is to start a discussion around data.  I wanted to keep it short and simple and not necessarily design a statistically sound questionnaire. As a software application supporting this process I used LimeSurvey.

2. The measure points are the values of the answers I configured behind the scenes in LimeSurvey from 0 to 4. To generate insights later, I had to stick to certain question types in LimeSurvey that support assessment value. In our case an array.

3. The database of LimeSurvey is very well suited for surveys, but not for analytics. That is why I designed a couple of views in the MySQL database of LimeSurvey to denormalise the data. I needed to build a couple of intermediate views before I could get all the data in the format I wanted in one table. This step took me the most time in the whole project.

4. Now we can connect Metabase to the database and use the view we created there to visualize the data. When we were doing this we noticed that the answer_value was a string so we adjusted our view to cast it as an integer.  Now we could use the answer_value in aggregations, like the average answer per topic.

5. Metabase allows to filter the different visualizations based on one of the fields in the data. We can also see how CEO or CFO's of organizations between 2 and 10 employees think about the data maturity. But this allows us to quickly provide feedback to the participants of the survey by filtering on their name.

If more colleagues of the same organization take the survey, we can filter here on the different departments. This facilitates a discussion on possible differences in maturity between departments.