Subscribe to the newsletter!

Collibra in the wild

I was curious to learn how data management tool Collibra is implemented in practice.

Demo cases or the info on the website is usually written for best case scenarios, but not always representative for the real world. I was lucky that the Chief Data Officer at a Belgian retail bank was so kind to set up a call with their implementation partner of Collibra, and a couple of weeks later I got the opportunity to follow a guided tour through their environment.

Overall, I was impressed by the sheer magnitude of the data governance program at the bank and the capacity and flexibility of Collibra to handle this smoothly. The speed at which certain regulatory requirements like BCBS 239 and GDPR were put in place, forced the banks to put data governance systems in place very quickly.

This explains the fact that they were very early adopters of the data governance platform. Improvements are certainly possible in the way Collibra is implemented at this Belgian retail bank. Possibly the next steps could be to further integrate business terms and logical data models and to align the different technologies to facilitate automating the data catalogs and lineage.

What does a data platform do?

If I had to start from a blank sheet and describe what I would expect a data management platform to do, I came up with two things: data catalog and data lineage. A look at the website of Collibra quickly teaches me that I missed the data quality component of such a platform.

On the website of Collibra we find the following products: Data Governance, Data Catalog, Data Privacy, Data Lineage and Data Quality. For me a data catalog is the starting point for a data dictionary of which data privacy is an attribute. But I don’t want to get lost in a semantic discussion around these terms so l’ll try to keep the language as crisp a possible – as Jo thought us at Deloitte – and assume a data platform should do three things: data dictionary, data lineage and data quality.

Data Governance Portal

Collibra at a Belgian retail bank

Collibra is oriented towards the users in the business, while other data governance tools like Microsoft Purview or Informatica Axon are built more for technical IT specialists. The starting page for Collibra at this bank allows to reflect this approach by presenting itself as a hub where topics about Data inventory, Data sharing, Data quality, Data usage, and Enterprise Data Model can be explored. Some of those topics seem aimed at more technical users, nonetheless.

Portal

Data dictionary

The data dictionary overview contains a list with applications, the tribe they reside in, their asset manager, a link to the data dictionary and a description. The integration of Enterprise Data Models in SAP PowerDesigner into Collibra is underway. If the link between the conceptual model of business terms and the logical model in PowerDesigner is well documented, Collibra will be able to build the data catalog automatically. For now, this is already the case for Business Objects reports, where the names of the fields correspond with the names of the data elements in the business.

A possible use case here is, if a business intelligence developer wants to use certain data in a report, she can shop for data in the data catalog and request access to the referential, who will make sure the use of the data is for a legitimate interest and a certain period is defined.

Another use case is the possibility to install data governance as a gate that has to be passed when developing a new application, just like security and architecture are gates that have to be passed. In case of the data governance gate the requirements will be that the privacy and ethics of the data usage are guaranteed and well documented.

Data dictionary

Data lineage

In the data inventory you can find the data lineage, meaning the way the different data elements flow through the different systems. In the example you can see in which applications the email address is used. This allows for the kind of traceability that is mandatory for banks. You can also identify the master source for the different business terms and technically it would be possible to implement master data management in Collibra by implementing workflows.

Data lineage

This data lineage in Collibra at the bank is not generated automatically based on the analysis of data transfer processes, as I was expecting, but was collected in a declarative way. This means by interviewing the responsible about the usage of the different data elements. Technically it is possible to write a parser on the sql script to extract the data lineage information from the databases and ETL tools, but this is not yet in place at this Belgian retail bank.

Data quality

The data quality rules are developed on request and the result is displayed in Collibra. It is interesting to see the different data quality dimensions that are tested in the different rules: timeliness, completeness, validity, uniqueness, consistency, accuracy. Data quality is very important in view of the regulatory environment for a bank.

Machine learning can be used for data classifications. This way the algorithm will suggest a datatype for a certain data element and a percentage of certainty that this field is actually of this type. For example, an email address.

Data quality

Conclusion

Keep in mind this a very specific early instance of Collibra, that doesn’t have all the latest features at its disposal. To get a better view on the newest additions and extensions I recommend having a look at Collibra University. One of the reasons I wanted to better understand what Collibra does, is to see whether it would be interesting to implement a data governance platform as smaller or medium size enterprises. Taking the effort in work and budget into account I only see a use case in sectors with very stringent regulations regarding data in organizations of a certain size.