Anatomy of a Modern Data Solution on Snowflake — Visualization

Learn about visualization options for Snowflake and what to consider when designing a solution.

Milan Mosny
Infostrux Engineering Blog

--

Photo by Isaac Smith on Unsplash

This blog is the fourth in the “Anatomy of a Modern Data Solution on Snowflake” series. It talks about the third element of a solution — how to visualize data in Snowflake. The purpose of this exercise is to enable analysts and business users to use the data to improve their business.

Business and Technical Drivers

When deciding on approaches and technologies, consider the following:

  • Functionality — what is the visualization for, and how will the business use it? This may affect the requirements for the types and customizability of visualizations, the ability to aggregate, filter, drill down, drill across at the design time and at run time, interactivity, sharing, and exporting the results.
  • Ease of Use — consider both ease of use when consuming the visualization and when authoring the visualizations.
  • Modeling and model reuse — does the tool allow custom computation or modeling? Does it allow definitions of metrics that are re-usable across any subset of relevant dimensions?
  • Supported clients — various mobile, desktop and web platforms could be important to the users of the solution.
  • Supported caching modes — many tools would load (often pre-aggregated) data and won’t require interaction with the database until a refresh is triggered. This may lead to significant improvements in interactive performance and savings of database compute costs or, conversely, to a significant increase in the visualization tool costs. It’s a trade-off worth understanding.
  • Interactive performance — what is the desired response time to various user actions?

Other considerations are common to many other elements of the solution:

  • Latency and frequency of updates
  • Security and governance
  • Multitenancy
  • Location requirements
  • SLAs and Reliability
  • Support and Maintenance
  • Disaster recovery
  • Development and deployment process
  • True cost-to-own (TCO)

They were described in more detail in previous blogs.

Activities and Deliverables

As with most elements, the first step is architecture and design. The complexity can range from a simple setup of a SaaS tool to complex automation-driven, high-availability multi-tenant and multi-region deployments.

Even the simple setups require some thought to be given to security. Two general approaches are:

  • Using a service account to connect to Snowflake and then use the visualization tool's capability to secure the data
  • Use SSO for the tool and Snowflake so that each user has access only to the data that they have access to in Snowflake

There are many pros and cons to each, and almost always, some compromises as to the complexity of the solution that have to be made. The result, though, needs to be secure and well-governed.

Approaches and Technologies

Snowflake

Snowflake itself provides:

  • Snowsight — UI-based visualization tool hosted by Snowflake for users familiar with SQL that enables the creation of simple dashboards with basic filtering functionality. Sharing with other users is possible.
  • Streamlit — a programmatic platform that allows the creation of simple or complex interactive data applications in Python. Hosted by Snowflake.
  • Snowpark Container Services — announced at Summit 2023, Snowpark Container Services can host any containerizable visualization solution directly in Snowflake without needing a separate service or hosting environment. Open-source technology such as Apache Superset could be an option.

Third-party software and services

There are too many to list. On the commercial side, the usual suspects are Sigma, PowerBI, Tableau, and ThoughtSpot. On the open-source side, there is Apache Superset, Pentaho suite and others. I am curious whether the open-source will see a higher uptake thanks to the possibility of effortlessly hosting the solutions on Snowpark Container Services.

There are newcomers fighting for the spotlight, such LightDash or FlexIt.

An interesting feature of some of the newcomers and incumbents is the ability to support Semantic Layers such as Cube or dbtCloud that can simplify defining and reuse of metrics.

Scripting / Custom Development

As a last resort, if there is nothing readily available that fits the requirements, it's always possible to do a bit of custom development and host (and orchestrate) the results using any of the myriad methods available. For example, on AWS, we can use EC2, ECS, EKS, Lambda and many other services.

From experience, the effort to get a PoC or simple solution going looks (and is) not that high, but the effort to do this right is almost always very non-trivial and underestimated.

Conclusion

The previous blogs showed us what it takes to set up a Snowflake account and load the data.

This blog shows what needs to happen to enable the visualization of Snowflake data. We have listed the considerations that need to be taken into account before selecting the right approach and touched on the actual ways to implement the task.

At this point of our journey (account, EL and visualization), we do have a data analytics solution. However, many data teams will find that it's not a very good one. Here's what it does well (bold) and not so well (italics):

  • Capabilities: Descriptive — Canned Reports vs. Ad-hoc reporting, Diagnostic, Predictive, Prescriptive
  • Trustworthiness: Documentation, Availability, Data freshness, Low bug frequency, No surprises.
  • Data Quality: Completeness, Integrity, Validity, Uniqueness, Consistency, Accuracy, and Conformity to standards
  • Agility: Low
  • Security and Governance: Basic

A lack of modeling and resulting lack of reusability hurts agility and many aspects of trustworthiness and data quality. It's also making security and governance difficult.

A lack of automated tests and observability hurts trustworthiness, data quality and agility.

In the following blog posts, we explore how the additional elements (Data Modeling, Documentation, Automated Tests, Orchestration, DevEx, CI/CD, Data Governance, Metadata Management, Data Quality and Observability) improve our solution considerably.

Thank you for reading to the end. I hope you enjoyed the blog. Any feedback is welcome — please leave a comment!

I'm Milan Mosny, CTO at Infostrux Solutions. You can follow me here on Infostrux Medium Blog or LinkedIn. I write about Snowflake, data engineering and architecture and occasionally about other topics dear to my heart.

--

--

Co-founder and CTO of Infostrux. Comprehensive professional and managed services for all of your Snowflake needs. https://infostrux.com