FYI - the reason for doing this is because the sou...
# questions
r
FYI - the reason for doing this is because the source table in Snowflake is very large, and I only want to return a subset of the table, filtered by keys in another dataset
d
Would recommend using ibis.TableDataset. That will connect to your table, but you can filter in node, and only data you need will be processed.
This is cleaner than connecting and using SQLAlchemy in the node
r
Ok nice, I’ve never used Ibis. I’ll give that a go and see how I get on
Thanks!
Quick question - are Ibis tables and pandas dataframes interchangeable?
Or rather, if I output an ibis table from a node, and use it as an input to another node that deals with it like a pandas dataframe, is that going to cause problems?
d
I should've added a disclaimer: I work for the Ibis team. 😉 That said, I do think this is the right way to solve your need.
Quick question - are Ibis tables and pandas dataframes interchangeable?
You can convert back to pandas table (or Polars) as you need. The dataframe syntax is not the exact same, but it's familiar enough dataframe syntax.
Or rather, if I output an ibis table from a node, and use it as an input to another node that deals with it like a pandas dataframe, is that going to cause problems?
You can use
to_pandas()
whenever you want to leave Snowflake context, and it'll materialize the data to pandas dataframe
r
I see
So if I use Ibis, is that equivalent to using snowpark? I.e., execution is happening in snowflake, and not loading the data locally?
d
I.e., execution is happening in snowflake, and not loading the data locally?
Correct
r
Ok nice.
d
(oh yeah, you can also use the Snowpark dataset as an alternative; I haven't personally used it, but it should achieve the same)
r
I was avoiding the snowpark API due to the python versioning issues (as far as I can see it only accepts 3.8)
So this seems better
d
Oh yeah, that's one of the big things we lord over Snowpark 😛
r
Haha. Glad to hear it
I literally had no idea Ibis existed, so very interested to try it out
🙌 1
Will let you know how I get on
d
Just in the spirit of transparency—I looked it up, I think newer versions of Snowpark do support later Python versions? Think they improved this in the past year or so. I'll still push for use of Ibis, but want to make sure you know the options fairly. 🙂
r
Ah interesting. I did try to install kedro_snowflake earlier, but that seemed to have an issue with it not being 3.8
👍 1
I’m trying to use ibis.TableDataSet now, but having difficulty getting the connection configuration right
I’ve defined my credentials in local/credentials.yml which includes: snowflake_prod_ibis: backend: “snowflake” user: “***” password: “***” account: “***” database: “***” warehouse: “***” role: “***”
Then my dataset is defined as:
isochrones: type: ibis.TableDataset connection: snowflake_prod_ibis table_name: demand.location_isochrones
But when trying to load kedro, I get an error “‘str’ object has no attribute ‘get’”
Screenshot 2024-08-13 at 19.44.58.png
It seems that it’s not loading the yaml from the credentials file?
👍 1
d
The
ibis.TableDataset
doesn't support
credentials.yml
. You would need to pass it separately. You can use a resolver to "hide" it or load from environment.
(The reason
ibis.TableDataset
doesn't support
credentials.yml
(yet?) is because Ibis supports a number of backends, and no consistent way of passing credentials across all backends.)
r
Ok understood
How would i use a resolver?
d
I need a bit of time (in a meeting), but can help figure out what the credentials should look like in a sec, if it's not clear. It should map to the https://ibis-project.org/backends/snowflake#connect, so something like:
Copy code
connection:
  backend: snowflake
  user: user
  password: password
  ...
r
Cool, I managed to get credentials defined properly using outside of kedro, so will play with that some more
r
Ok so I think I’m nearly there - I can connect to snowflake via ibis in a notebook and get the data I want, although I noticed that if I add ‘schema’ as a parameter to ibis.snowflake.connect, it throws some sort of UDF error. Also, con.table() requires database parameter (which is the snowflake schema).
In kedro, I’m not getting an error with the credentials provided, but I’m getting “Table [XXX] does not exist or not authorised” when loading table in kedro
I think this is likely to do with needing to provide the snowflake schema, but I could be wrong. Note that in the catalog, including the schema in the table_name parameter doesn’t work either
To be clearer, Neither of these catalog definitions works:
Copy code
isochrones:
  type: ibis.TableDataset
  connection: ${_snowflake_prod_ibis}
  table_name: DEMAND.LOCATIONS
Copy code
isochrones:
  type: ibis.TableDataset
  connection: ${_snowflake_prod_ibis}
  table_name: LOCATIONS
Where snowflake schema = “DEMAND” and snowflake table = “LOCATIONS”
Any idea how to include the schema in the definition?
d
Let me take a look!
r
locations = con.table('LOCATIONS', database='DEMAND')
returns correct data, so I assume this is a config issue
Thanks!
FYI - I’m testing this in a notebook with
ib_table = catalog.load('isochrones')
which is where it is failing
d
locations = con.table('LOCATIONS', database='DEMAND')
returns correct data, so I assume this is a config issue
How did you create
con
here? Can you share the code?
r
con = ibis.snowflake.connect( user=“***“, password=“***“, account=“***“, database=“***“, warehouse=“***“, role=“***” )
d
and
database
here is not "DEMAND", right?
r
Correct. In this the database is the actual snowflake database name
👍 1
(ANALYSIS)
And that’s exactly how I’ve defined the connection parameters in catalog.yml
d
In short, I think the gap is simply that the
ibis.TableDataset
doesn't support arguments to the underlying
table
method, as you've pointed out. If this line were different, you could pass your argument: https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/kedro_datasets/ibis/table_dataset.py#L174 I can push an update for this. in the meantime, you can easily vendor the dataset yourself (see https://docs.kedro.org/en/stable/data/how_to_create_a_custom_dataset.html if you need guidance, or https://github.com/deepyaman/jaffle-shop/blob/main/src/jaffle_shop/datasets/ibis/table_dataset.py for an early version of the Ibis dataset before it was part of
kedro-datasets
to see how this can be done), and add this functionality.
r
Ok cool, thanks for that, that makes sense.
I appreciate you pushing the update.
I’ll adapt the code in the meantime. Otherwise I’m pleased to have found out about ibis
One last question - if I follow the snowflake backend docs here - https://ibis-project.org/backends/snowflake , it says I can pass database as database=“DATABASE/SCHEMA”, however when I do that in python I get a UDF error…
Copy code
UserWarning: Unable to create Ibis UDFs, some functionality will not work: 100132 (P0000): JavaScript execution error: Uncaught Execution                  
                             of multiple statements failed on statement "CREATE DATABASE IF NOT                    
                             EXISTS ..." (at line 1, position 0).                                                  
                             SQL access control error:                                                             
                             Insufficient privileges to operate on account ...
I assume that’s just to do with permissions on that schema
However, it then means I can’t use that connection object in any way
d
Let me ask somebody more familiar with the Snowflake backend. 🙂 That doesn't seem like it should happen.
r
Ok cool. Much appreciated!
What time zone are you in btw?
d
In short, you did something like:
Copy code
con = ibis.snowflake.connect(
    user="user",
    account="safpqpq-sq55555",
    database="my_database/my_schema",
    warehouse="my_warehouse",
    authenticator="externalbrowser",
)
and that didn't work, right? Got the Ibis UDF error.
r
Yep exactly
d
What time zone are you in btw?
US Mountain Time (Salt Lake City, UT)
r
I’ve connected on LinkedIn - always good to know great data people!
🙌 1
d
@Rob McInerney Asked the team, looks like the Snowflake docstring is out of date! You should just be able to connect like:
Copy code
con = ibis.snowflake.connect(
    user="user",
    account="safpqpq-sq55555",
    database="my_database",
    schema="my_schema",
    warehouse="my_warehouse",
    authenticator="externalbrowser",
)
In that case, you should also be able to use the Ibis
TableDataset
as is, by passing
schema
in the connection config. (Specifically, any of the input parameters in https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-api#label-snowflake-connector-methods should get passed under the hood from https://github.com/ibis-project/ibis/blob/9.3.0/ibis/backends/snowflake/__init__.py#L270)