Evaluating Modern Data Governance Catalogs — Part 2

Learn how to evaluate the vendor criteria of a Data Catalog solution for your organization.

Shravan Deolalikar
Infostrux Engineering Blog

--

In Part 1 of Evaluating Modern Data Governance Catalogs, we touched on the Data Architecture criteria we use at Infostrux to evaluate Data Catalog solutions for our clients. (If you haven't read it, find it here).

In a quick recap, discoverability, addressability, security, reliability, and self-describing features are requirements that should be used to analyze if a Data Catalog solution fits your data ecosystem. In this article, I will explore more avenues to evaluate a modern data catalog and some discussion of trends we are seeing.

Data Democratization and Data Governance

At its heart, Data Governance is about changing people’s relationships with data. Often an organization's key driver to adopting a data catalog will lean more heavily toward compliance and security concerns or data democratization and discoverability. Some companies desire both explicitly. It’s essential to recognize that data governance and democratization are joined at the hip.

The idea here is that the principle of data sharing tends to rub against the direction of data security. Companies strive to democratize data through initiatives like implementing data literacy programs or data catalogs while ensuring confidential data is not compromised.

Depending on where your requirements and company culture fall. Data catalog collaborative features such as discussions or commenting may drive your selection more than, for example, formalized data governance workflow features. For instance, smaller product-oriented enterprises may be more concerned with data discoverability than compliance and security issues.

Most of the top contenders in the space do have sufficient features addressing both sides of the spectrum. Some vendors need more governance workflows to control approved metadata, however. Gating metadata based on criteria is a best practice for proper data catalog implementation.

Take the time to think through how the different actors or personas in your data ecosystem interact with data assets. Think about the user experience. How will different personas or user types interact with one another? Is collaboration oriented around building context for a data asset? Make sure that discoverability is enabled sufficiently through the platform. This is generally achieved by having robust search features enabled by proper metadata modeling where relationships can be stored in a knowledge graph. The ability to see lineage and asset dependencies and curate what users see enables discovery.

Meta Metadata Model Example — Facilitates Data Discoverability

Ensure you are not flooding people with useless metadata and creating a swamp of information. Work through how you will make assets visible to business users. Understand what approvals and what kind of governance checks you need in place before a data asset is available for consumption.

DataOps and Data Engineers

One trend in the market is that many of the newer data catalog vendors in the space are heavily oriented toward DataOps. They integrate well with the technical landscape; they have crawlers and automated extraction mechanisms to ingest metadata from transformation tools like DBT, cloud data warehouses like Snowflake, and various other integrations.

The benefit of this is it provides the capability to have technical-focused observability of your whole data ecosystem. You can see the lineage of your data assets and sometimes even see where data pipeline issues compromise data assets. While this dramatically benefits your analytic and data engineers, this level of detail can be of little value for more business-oriented users.

The point here is to think carefully about the people in your organization using the tool and the different use cases involved.

Vendor Evaluation Criteria

Like any other technology procurement for your enterprise, examining the vendor's quality in the market context is critical for success.

Customer Training and Certification

A good vendor will provide training and knowledge paths for your team to become skilled on the platform. This may include customer training, certification programs, and tight communication through proper documentation and timely product updates. Ensure you are working with a vendor committed to making your team self-sufficient on the platform through education.

Customer Onboarding

A Vendor should provide support for a first proof of concept or, after procurement, should provide resources to help implement workflows in the tool. Often this will come in the form of a customer solution architect who knows the technical details of the device and can make recommendations based on your technological ecosystem. Additionally, they should be able to help you with some digestible project skeletons to help you onboard the tool.

Vendor Market Stability

Examine the vendor's time on the market carefully. While some of the newer contenders in the market may come at a more appealing price point, some of the more seasoned vendors have developed platforms that will grow with your needs over time, including workflows and functionality for areas such as Data Quality Management, Policy Management, Observability, and other vital areas.

From a security perspective, some of the veterans have more robust products. We have identified some gaps from a technical product perspective in some of the newer competitors.

Veterans tend to have seasoned customer onboarding and technical support that will walk you through implementing various workflows you define on the platform.

Data Quality Management and Data Profiling

I will touch briefly on Data Quality Management functionality and how they fit into a Data Catalog evaluation.

Some of the top contenders in the space offer some Data Quality Management tooling that sits either as part of the data catalog or a sidecar service.

While Data Quality Management might not be a top concern when evaluating a Data Catalog solution, it is worth considering the benefits of integrated Data Quality Management.

Data Catalogs with integrated Data Quality management functionality can simplify data stewardship workflows significantly. Stewards can maintain approval of data assets and ensure data quality is compliant with the same toolchain. From a business user or consumer perspective, if Data Quality is easily discernable for data assets in the catalog, it can go a long way in establishing trust.

While it's common for data teams to start with metadata management, your team will want to address data quality comprehensively at a particular maturity. This will involve some level of data stewardship practice, whether formal or informal. Data Profiling to monitor critical data elements. Then lastly, general observability of set data quality thresholds, usually in the form of a dashboard with measures from statistical process control. Control plains for all these activities will pay dividends in the long run.

Conclusion

In conclusion, understanding the interplay between Data Governance and Data Discoverability and understanding how people in your organization will interact with your data platform should drive your evaluation criteria. Remember to appreciate the need for governance controls, and at the same time, make sure asset discovery is seamless for your personnel.

Carefully examining the maturity of the vendor you are dealing with can pay dividends in the long run. It can make or break your metadata management strategy. Have a vision of how your ecosystem will expand, and see if the vendor can grow with you and provide you with the support you need.

At Infostrux, we have helped clients evaluate and implement data governance tooling and workflows. Check our solutions and contact us if you need help assessing and implementing a data catalog solution or service with other data management initiatives.

Thanks for reading my blog post. I’m Shravan Deolalikar, Principal Data Architect for Infostrux Solutions. You can follow me on LinkedIn.

Subscribe to Infostrux Blog at https://blog.infostrux.com for the most interesting Data Engineering and Snowflake news.

--

--

Principal Data Architect, areas of focus in Data Management, Data Warehousing, Data Integration and Modern Cloud Solutions.