Maybe this is pushing the current dataset factorie...
# questions
l
Maybe this is pushing the current dataset factories too far but is it possible to parametrise a SQL Catalog entry where the SQL is read from a file? Like:
Copy code
mytable:
  type: pandas.SQLQueryDataset
  credentials: postgres_dwh
  filepath: sql/mytable.sql
basically, I'd like to pass parameters to the SQL query
h
Someone will reply to you shortly. In the meantime, this might help:
a
Hey Luis, I think you could technically do it -
Copy code
mytable_{table}:
  type: pandas.SQLQueryDataset
  credentials: <cred>
  sql: SELECT * from {table}
And in your
pipeline_registry.py
/
pipeline.py
, have a script that reads the queries from a file and generates the pipeline dynamically. The way dataset factories works is that it reads the dataset name from the pipeline inputs/outputs and then fills in the placeholders in the catalog entry, so the dataset names might get crazy looking
l
ok, but not if the SQL is in a file right?
a
Yeah, the factory placeholder wouldn’t work within a file because the catalog wouldn’t load the queries, the file would be read inside the dataset initialisation part
👍🏼 1
n
This is where
ibis
would be a better fit for SQL parameterisation / multi-nodes SQL lazy evaluated
l
Are there docs on this? @Nok Lam Chan
n
https://docs.kedro.org/projects/kedro-datasets/en/latest/api/kedro_datasets.ibis.TableDataset.html I am not 100% sure if it supports the SQL interface, as ibis native interface is dataframe. So that may requires some change.
🙌🏼 1