The Big Data Catalog Market Comparison – Part 1
Timm Grosser, Senior Analyst for Data Management, shares his conclusions from BARC’s recent webinar, Portals to Enterprise Data Knowledge – 6 Data Catalog Products in Direct Comparison
Introduction
In BARC’s recent webinar, six vendors went head to head in a direct comparison – a webinar review by our data management expert Timm Grosser.
Data catalogs support finding and understanding data. In an extended sense, they also promote data governance and even direct data access. This sounds simple, but it is not.
The initiatives to implement and operate a data catalog are extensive. Data catalogs provide valuable structure, automation and functionality to successfully implement and evolve metadata projects. But why all the effort?
The advantages of data catalogs are primarily that they make business departments agile and flexible, while at the same time ensuring data consistency, transparency and security. The need to be able to find and understand one’s own data is important. The demand for data catalogs is correspondingly high.
The market for data catalogs is growing steadily and we are already aware of more than 80 different solutions.
In the BARC webinar, three data catalog market leaders – Alation, Collibra and Informatica – competed against three challengers – dataspot, Synabi and Zeenea – to demonstrate their products live in a head-to-head tool comparison.
In this blog, we look at the performance of the three market leaders: Alation, Collibra and Informatica. In part two of the blog series, we will talk about the three challengers and in part three, we will look back at the results.
The webinar is available to watch on-demand. You can access the recording here.
A scenario ensures better comparability
In order to be able to compare the tools better, we gave a scenario in which the providers had to demonstrate their solution for data democracy.
Each vendor had 20 minutes to present their solution as live as possible in the tool. Afterwards, attendees were able to put their questions to the providers in a five-minute Q&A session. The aim was to show the differentiating features of the solutions for this use case. This scenario made it possible to clearly define the differences between the products.
Informatica
Kash Mehdi, Data Governance & Privacy Segment Leader, presented Informatica’s Data Governance Catalog modules. Just last year, we saw different tools in the webinar. In today’s presentation, the workflow in the tool seemed consistent. His solution to the data democracy task centered around Informatica‘s Data Marketplace module.
We saw a consistent data shopping experience. From setting metadata search, shopping and provisioning to monitoring, everything was mentioned. To help the data shopper navigate the marketplace, Informatica showed clear search functions or tools to evaluate and narrow down the set of hits such as ratings.
Once the data set of choice has been identified, data access can be requested via a checkout button. How this should be done can be configured via the UI. For a seamless shopping experience, appropriate transport mechanisms must be stored and can be controlled.
From the perspective of the data steward, a set of governance functionalities (data quality, policies, release processes, task dashboards) was presented. Interesting here was the mention of the built-in ML functionality to help classify or find data. Also worth mentioning are the overlay effects that allow contextual information (e.g., DQ metrics) to be displayed at process steps in a business process.
Informatica was also the only vendor to show individual catalog areas for different business areas.
In the Q&A, Informatica was asked about measuring data quality. It was also asked if there was a report that published semantic incompatibilities between data. You can watch the Q&A and the full webinar here.
Collibra
Paul Dietrich (Area Vice President DACH) and Guido Bilstein (Senior Solution Engineer) gave a good and clearly structured presentation of Collibra‘s “one stop shop for data”. Quick access to the data catalog was provided by ‘Collibra for Desktop’. This module allows the user to mark a term (word) in any application and call the appropriate catalog entry via “hotkey”.
Afterwards, an illustrative, collaborative process involving a data shopper, data steward and business analyst was shown, who exchanged information with each other.
Many functions were shown, such as classification by machine learning during metadata reading, data lineage, profiling and policies.
Overall, a good insight into the workflow of the tool was given. In terms of data access, Collibra solved the task with a “Request for Data” button that can trigger a release process according to company policies.
The vote of the live audience on the top 3 strengths of Collibra’s tool
The Q&A focused on the connection between busy and technical metadata and an automated lineage. You can watch the Q&A and the full webinar here.
Alation
Christian Herzog, Senior Sales Engineer at Alation, highlighted the relevance of people and an organization to a data democracy. He also stated that a data culture is needed for this to happen. At the beginning, the presentation showed a typo in the search of the data catalog, which was promptly corrected by the system and led to the correct hits. The intentional typo, however, was quite simple.
Given the recent takeover of Lyngo Analytics by Alation, I am curious to see what will be possible in the future. Lyngo Analytics specializes in natural language processing (NLP). Here, too, the desired personas were presented in an easy-to-understand manner and it was emphasized that users learn by using the data catalog through warnings, endorsements and further information and, so to speak, build up their data competence.
Functions such as ML for classification, interactive data lineage, search functions and others were naturally presented. One highlight turned out to be the use of “Usage Metadata”, which helps both consumers and data stewards to better assess and prioritize data sets based on their popularity. Alation obtains this information from the logs of the connected systems.
Alation was the only manufacturer to show which SQL statements are typically used to query data. This gives the data steward a better insight into how data is used and which tables the data is typically linked to. Alation takes the view that one person cannot know everything. Thus, a feature has been introduced that allows a data curator (steward) to identify the top user(s) of a data object and contact them to complete the information on the data. Data access itself is also initiated in Alation via a workflow.
Lastly, the Alation Composer, an SQL editor that helps the user to build queries that take into account the existing contextual information from the data catalog is worth mentioning.
The vote of the live audience on the top 3 strengths of Alation’s tool
In the Q&A, historization in the metadata repository, NoSQL / Hadoop and SPARQL-graph-databases came up. You can watch the Q&A and the entire webinar here.
In the next post of this three-part blog series, we will look at the performance of the three market challengers – dataspot, Synabi and Zeenea.