In Snowflake I’ve queried hundreds of millions of rows in seconds, built data pipelines from a vast array of sources into data models, and constructed complex user access hierarchies to fit the business demand.
With every feature release, I get excited as to what new solution I can implement with the toolset and the announcement of Snowflakes External Functions was no exception to this.
“So what are Snowflake External Functions?”
The ability to execute technical functionality not native to Snowflake but instead call resources hosted as web service. This adds a whole new domain of potential ETL capabilities to implement, leading to solutions such as:
- Retrieving data from generation services
- Accessing machine learning algorithms
- Using data science programming languages
- and Building simplified data pipelines without the need to stage any data.
As long as the service is accessible by a HTTP endpoint, Snowflake can now reach it!
“What did I do with all this new found potential?”
Made my life easier by locating local pizza restaurants more accurately…
Like any self respecting developer, I love pizza! I have it once a week (at least) so I have created a dataset that relates to my habit. I have a table full of local pizza restaurants with their name, address, and global coordinates as latitude & longitude. I’m great with data but I need these co-ordinates converting to something more memorable…
I came across the service what3words which aims to make giving directions and finding exact locations easier. In short, it has turned the world into a huge grid of 3 metre wide squares which all have been given a unique combination of three words for reference (see here for more information).
This three word reference seems much easier to remember to me than a long strings of digits for locations and therefore I want to update my dataset to include these. I could manually go to the what3words website, search each latitude & longitude row and update my table, but due to my love of pizza there are a lot of rows in my table (more than the screenshot shows!) and it would take me forever!
With the new functionality Snowflake has provided in these External Functions, I can now develop a process which calls the what3words API with some parameters and returns me back the three word address reference I desire directly to the table in the query window. This process will:
- User runs a SQL command
- HTTPS request is made to Proxy Service
- Proxy forwards this request onto to what3words API
- API responds with 3 word reference which is passed back down the chain to the user, result!
Here is an overview of the process I’ve implemented, using Microsoft Azure functionality:
Function Snowflake->>API Management Gateway: HTTP Request API Management Gateway->>Azure Logic App: Call Logic
App Function Azure Logic App–>what3words API: API Request for
3 Word Reference Azure Logic App->>API Management Gateway: HTTP Response Note Left of Azure Logic App: Response body contains
what3words API JSON
response body API Management Gateway->>Snowflake: HTTP Response Snowflake->>Business User: ResultSet Note Right of Business User: JSON returned to user in
query window
As you can see in my diagram, there are a few different components required to achieve this but I’m just going to focus on the Snowflake request and cloud service function for this article, as the other components were setup with default values and recommended provider configurations. Check out the Snowflake documentation for a detailed setup guide.
“What’s in the Snowflake Request?”
The request is generated from within Snowflake is fairly simple in that as a user, we define an External Function using the correct DDL Syntax and the rest is handled by Snowflake. The main configuration we have to do here is specify what fields we would like to be sent as the request body to our service, which are in turn converted to an array of arrays (one array per row the function is executed against)
Snowflake DDL for creating the External Function:
CREATE OR REPLACE EXTERNAL FUNCTION EFUNC_W3W_3WA(latitude number, longitude number)...
This produces a request JSON body by Snowflake as below:
{
"data": [[0, 40.76932368,-73.95456572]]
}
Note: we see three fields in each array as Snowflake prepends the row number to the request:
{
"data": [[Snowflake Row Number, Latitude, Longitude]]
}
“How does a Cloud Service Function work?”
I chose to use an Azure Logic App as my cloud service function as I’m a fan of low to no-code solutions and like how its able to natively handle API request retries or errors with little configuration. At a high level, my function is:
- Triggered by the HTTPS POST request from Snowflake, expecting the array of arrays.
- For each row in the array:
- A HTTP GET request is executed against the what3words API using the latitude and longitude co-ordinates in that row, to return a three word address reference.
- The what3words API response is parsed into a new array schema of:
[Snowflake Row Number, {what3words response JSON object}]
. - Appended this to a return array to generate the array of arrays required by Snowflake as its HTTPS response.
- Responds to the request with the array of arrays as the body.
{
"data": [
[0,
{"country":"US","square":{"southwest":{"lng":-73.954594,"lat":40.769297},
"northeast":{"lng":-73.954558,"lat":40.769324}},"nearestPlace":"Manhattan
, New York","coordinates":{"lng":-73.954576,"lat":40.769311},"words":
"obey.booth.field","language":"en","map":"https://w3w.co/obey.booth.field"}
],
[1,
{"country":"GB","square":{"southwest":{"lng":-2.591216,"lat":51.474881},
"northeast":{"lng":-2.591173,"lat":51.474908}},"nearestPlace":"Bristol",
"coordinates":{"lng":-2.591195,"lat":51.474895},"words":"pose.logo.hurray",
"language":"en","map":"https://w3w.co/pose.logo.hurray"}
]
]
}
Note: we still need to pass back the row number as the first array element!
“What happened?!”
Now all that’s left is to execute the function and watch the data roll in!
-- 1. Execute function with custom values
SELECT EFUNC_W3W_3WA(40.76932368,-73.95456572)
;
-- 2. Execute function with table field values (LAT & LONG from PIZZA_LOCATIONS)
SELECT
NAME, ADDRESS, LAT, LONG,
EFUNC_W3W_3WA(LAT, LONG) W3W_JSON
FROM PIZZA_LOCATIONS
;
-- 3. Execute function with table field values but parsed out the 3 word address reference
SELECT
NAME, ADDRESS, LAT, LONG,
EFUNC_W3W_3WA(LAT, LONG):words::varchar WHAT_3_WORDS
FROM PIZZA_LOCATIONS
;
Responses from these actions then appear as:
Awesome, now as a Snowflake user (and pizza addict) I can call the what3words API and return back the response for each latitude & longitude making finding my favourite location easier than ever before! I can also persist the three word address references for future queries or users to save resources by calling my function as part of an update statement:
-- Add column to table
ALTER PIZZA_LOCATIONS
ADD COLUMN WHAT_3_WORDS VARCHAR
;
-- Insert values into column by calling external function
UPDATE TABLE PIZZA_LOCATIONS
SET WHAT_3_WORDS = EFUNC_W3W_3WA(LAT, LONG):words::varchar
;
“My final thoughts?”
It may seem like I’ve had to set up a lot of components just to get data for a single field, but if we step back and look at the bigger picture’s potential, I now have the infrastructure setup to access data from any HTTPS endpoint if I tweak the Azure Logic Apps functionality as required.
This concept could be taken even further if a wrapper layer was developed between the function and the proxy service in that a generic process is developed to translate to/from the required Snowflake format (array of arrays) and then all calls to external endpoints should become just a configuration activity while transformation is handled by this wrapper. But for now, its time to go collect my pizza takeaway with my new coordinates!
0 Comments