Snowflake March 2023 New Features and Releases

Snowflake's Latest Features and Releases with a deep dive of the month

Augusto Kiniama Rosa
Infostrux Engineering Blog

--

Photo by mana5280 on Unsplash

March was an incredibly busy month focused, it seems, on bug fixes, smaller features, and many private previews (not included in this article as they are not public yet). Jakarta in Indonesia gets a Snowflake region now.

It was hard to pick what feature to demo. This is where my bias comes in as I chose features that I like.

I have to say that a little bird says that Snowflake has many outstanding features in Private Preview at the moment, and it will be a fantastic party of announcements; I hope, during the Snowflake Summit. Snowpark is heating up, so –I would suggest you register before it sells out.

For this month, I am doing a demo on Streams on views and Snowpipe Streaming quickstart lab. I am excited about the possibilities Snowpipe streaming brings.

New features in Snowflake

New Regions

  • Preview: AWS on Asia Pacific (Jakarta)

New Features

  • General Availability:
  • Preview: Tabular Return Values from Java or Scala Stored Procedures
  • Preview: Snowpipe Streaming or Snowpipe Streaming API, very cool!! No more staging files for streaming with writing directly into TABLES.

Security

  • SCIM Updates: Support for Password Synchronization from Okta and other custom SCIM

Data Lake

  • External Table and Directory Table Auto-Refresh Observability and Billing (Beware now snowflake will bill you for auto-refresh of external tables, but you can still do manual refreshes at no cost)

Virtual Warehouse

  • General Availability: Latency and Concurrency Improvements (across all regions, Snowflake improved concurrent query processing and throughput and facilitated faster query execution)

Web Interface

  • General Availability: Individual Task Observability (graph view to highlight dependencies and order of execution)
  • Preview: Python Worksheets, we can now use Python inside SnowSights Worksheets

SQL Updates

  • CREATE SECURITY INTEGRATION & ALTER SECURITY INTEGRATION (New SCIM SYNC_PASSWORD Property)
  • SHOW SHARES Command: Support for STARTS WITH and LIMIT … FROM
  • Preview: Geospatial Functions for Shape Transformation and Orientation
  • Preview: Support for Specifying How to Handle Invalid Geospatial Shapes

Connectors

  • Bug fixes releases: (Python 3.0.1, Python 3.0.2, JDBC 3.13.29, Go 1.6.19, Spark 2.11.2, SQLAlchemy 1.4.7, ODBC 2.25.10, Snowpark Library for Python 1.20/1.3.0)
  • New features in these: (.NET Driver 2.0.21, Python 3.0.1, Python 3.0.2, Go 1.6.19, Nodejs 1.6.20, ODBC 2.25.10, Snowpark Library for Python 1.20/1.3.0, JDBC 3.13.29)
  • SnowSQL Client 1.2.25, news feature includes: recursion_limit option to limit the Python recursion depth and QUERY_TAG CLI

Data Pipelines

  • General Availability: Streams on views extend table streams to track change data capture (CDC) records for views, including secure views.

Data Governance

  • Allow Masked Columns as Inputs to Row Access Policies and Conditional Masking Policies

Replication Updates

  • Preview: Account Replication: Notification Integration Support, you can now replicate two new types: TYPE = EMAIL and TYPE = QUEUE with DIRECTION = OUTBOUND

Deep Dive of the Month

Snowpipe Streaming API

What is the difference between Snowpipe Streaming and Snowpipe? The API is a complement to Snowpipe, with the significant difference being that it is written directly into rows instead of files that need to be ingested still into rows. In a way, it enables near real-time ingest into the tables and continuously and automatically loading directly. See the image that explains the difference.

Currently

Currently, you can only do inserts, a single row, or multiple. No other DML operations are supported, like delete, for example.

I decided to simply use the Snowflake Quickstart lab for this as it was readily available instead of trying to do one. So, I played with this using the quickstart lab from Snowflake here. The lab needs some improvements as the AWS Cloudformation did quite work on the first try, but it worked well enough to demonstrate the power of Snowflake streaming.

I will try this same lab but using Azure Event Hub for a subsequent article just to demonstrate the same capability outside the AWS world.

Let me know how the lab went for you.

Streams on views

Up until now, Snowflake was able to track changes in tables. However, most people using Snowflake are using views as part of their dbt implementation inside Snowflake.

What use can you have for tracking changes in views?

The primary use case is Changed Data Capture (CDC) and changes tracking for incremental updates in your pipeline — we can recompute only what changed. The fact that you can do it on views is fantastic. It means you have more options on how you materialize the results — you can use tables or views and still track what has changed.

Outside the warehouse — we can use it for alerts driven by business logic

  • Easily figure out when a KPI goes over or under a threshold
  • Track the status of a business process changes without having to re-query the whole view and compare it to a previous state

FYI, at this point, you cannot track changes for Materialized Views.

Sorry but this month, I was a little lazy and used only Snowflake-provided examples as I experimented with the features.


-- Create multiple tables with matching column values.
CREATE TABLE birds (
id number,
common varchar(100),
class varchar(100)
);

CREATE TABLE sightings (
d date,
loc varchar(100),
b_id number,
c number
);

-- Create a view that queries the tables with a join.
CREATE VIEW bird_sightings AS
SELECT b.id AS id,
b.common AS common_name,
b.class AS classification,
s.d AS date,
s.loc AS location,
s.c AS count
FROM birds b
INNER JOIN sightings s ON b.id = s.b_id;

-- Create a stream on the view.
CREATE STREAM bird_sightings_s ON VIEW bird_sightings;

-- Insert values into the tables.
INSERT INTO birds
VALUES
(1,'Scarlet Tanager','P. olivacea'),
(14,'Mallard','A. platyrhynchos'),
(48,'Spotted Sandpiper','A. macularius'),
(92,'Great Blue Heron','A. herodias');

INSERT INTO sightings
VALUES
(current_date(),'Gibson Island',1,4),
(current_date(),'Lake Los Pajaro',14,12),
(current_date(),'Lake Los Pajaro',92,12),
(current_date(),'Gibson Island',14,21),
(current_date(),'Gibson Island',92,5);

-- Query the stream.
-- The stream displays a record for each row added to the view.
SELECT * FROM bird_sightings_s;


-- Consume the stream records in a DML statement (INSERT, MERGE, etc.).

-- Query the stream.
-- The stream is empty.
-- Delete a row from the birds table.

DELETE FROM birds WHERE id = 14;

-- Query the stream.
-- The stream displays two records for the single DELETE operation.
SELECT * FROM bird_sightings_s;

Thank you for hanging to the end with these two new features being demoed. See you again in April.

I hope you enjoyed the March releases for Snowflake.

I’m Augusto Rosa, VP of Engineering for Infostrux Solutions. Thanks for reading my blog post. You can follow me on LinkedIn. And subscribe to Infostrux Medium Blogs https://medium.com/infostrux-solutions for the most interesting Data Engineering and Snowflake news.

Sources:

--

--