Why Snowflake External Network Access Changes Everything

Learn about the impact that this seemingly simple feature can have

Milan Mosny
Infostrux Engineering Blog

--

Photo by Markus Spiske on Unsplash

Recently, Snowflake announced a public preview of its External Network Access feature. It allows Snowpark UDFs and Stored Procs to reach outside networks without going through External Functions. You can use Python (or your other favorite Snowpark language) to call outside APIs directly. If you are curious about how this works, here's a link to Snowflake documentation with examples of accessing a Google Translate API and an external lambda function.

How Does It Compare to External Functions?

The older External Functions feature (as opposed to the new External Network Access feature) required setting up an API Gateway on one of the supported clouds (AWS, Azure, GCP) and often a serverless function in that cloud to execute the functionality. It was not exactly rocket science, but it was work. There are a couple of open-source projects to make this easier, e.g., Geff, but they still require a bit of non-trivial cloud engineering to do it right.

The new External Network Access feature does not need any of this. No cloud engineering means less work. It’s all simpler and, therefore, more usable.

It’s also safer. You don’t have to worry about the security posture of a separate cloud solution. You don’t have to worry about whether the access to the serverless cloud functions is protected, whether the deployment of the code is governed and secured, whether there is proper monitoring and alerting when security vulnerabilities are detected, whether your junior dev did not leave that proverbial S3 bucket open to the public by any chance and many other potential issues.

External Network Use Cases

Extraction and Load

While extracting and loading data directly to Snowflake via External Functions was possible, it was rarely done. The usual way of getting data into Snowflake “by hand” was somewhat awkward as one generally needed an outside process to extract the data and upload it to the stage in cloud storage (like Amazon S3) or potentially into an internal stage on Snowflake and then invoke COPY INTO or Snowpipe. Doing it in real-time meant standing up outside streaming services such as Kafka. Modern Data Stack advocated using third-party tools. Running open-source tools such as Airbyte or Meltano required a separate infrastructure, which usually needed cloud engineering work and resulted in some safety concerns. Using hosted services such as Fivetran is not exactly cheap. While the vendors do promise (and very likely deliver on) security, it still means spreading the governance and a good amount of trust to another service. Running proper CI/CD to set up data sources and targets on the hosted services is possible via APIs, but it is far from straightforward.

Now, the extractors can be run inside of Snowflake. Airbyte posted a half-hearted native app that does just that for LinkedIn ads, but chances are more apps or connectors will appear soon. Suddenly, no cloud work is required, and governance and CI/CD are taken care of by the mechanisms already used for the rest of your solution on Snowflake. It all got much simpler and more accessible.

Reverse ETL

A similar story can be told about reverse ETL. Options were External Functions, open-source services (not many) and hosted services. Suddenly, invoking outside APIs to do work is simple. Perhaps we'll have a set of native apps that make it even easier soon.

Orchestration

What’s the impact on orchestration? Snowflake provides orchestration via Tasks. It is imperfect and far from fully featured compared to your Airflow or Dagster. However, it does the job reasonably well. The trouble was that Tasks could not orchestrate extraction or reverse ETL easily. Now they can. The same argument that was presented about third-party extractors can be made here. The need for cloud infrastructure or possibly expensive third-party hosted solutions, security concerns, separate CI/CD, and separate governance are all going away. The orchestration of the whole solution can be much simpler.

Conclusion

A seemingly simple feature of External Network Access can significantly impact how we build data solutions on Snowflake. It will be much simpler and more accessible, which also means cheaper. I would compare the impact to that of Snowpark Container Services. It's just that External Network Access is already here…

Thank you for reading all the way to the end. I hope you enjoyed the blog. Any feedback is welcome — please leave a comment!

I’m Milan Mosny, CTO at Infostrux Solutions. You can follow me here on Infostrux Medium Blog or on LinkedIn. I write about Snowflake, data engineering and architecture and occasionally about other topics dear to my heart.

--

--

Co-founder and CTO of Infostrux. Comprehensive professional and managed services for all of your Snowflake needs. https://infostrux.com