https://kedro.org/ logo
#questions
Title
# questions
e

Eugene P

12/22/2022, 2:52 PM
Hi everyone! Wanted to check with you as more experienced kedroids if I’m doing something stupid. Several preliminary steps of my workflow requires running some heavy-lifting sql queries with Postgres/PostGIS (must be executed in particular order). At the moment I’m doing it the following way: 1. I have separate folder with SQL queries 2. I use catalog to declare
pandas.SQLQueryDataSet
. One for each query. 3. I have generic node function to call SQL query, returning empty df like this
Copy code
def run_sql_script_node(sql_query_dataset: pd.DataFrame,
                        blank_df_for_nodes_order: pd.DataFrame,):
    return pd.DataFrame()
4. I define required nodes controlling the execution order by using consecutive empty_df outputs/inputs
Copy code
node(
   func=run_sql_script_node,
   inputs=["create_rropen_cadcost_schema_and_tables_dataset", "empty_cadcost_df0"],
   outputs="empty_cadcost_df1",
   name="create_rropen_cadcost_schema_and_tables_node",
        ),
node(
   func=run_sql_script_node,
   inputs=["create_rropen_cadcost_staging_table_dataset", "empty_cadcost_df1"],
   outputs="empty_cadcost_df2",
   name="create_rropen_cadcost_staging_table_dataset_node",
        ),
I do understand that Kedro may be the not-the-100%-appropriate-tool to control SQL workflows, but for the sake of total DS pipeline integrity and my kedro-learning would like to stick to it (it is amazing, btw!). This workaround works and works correctly, but I was thinking that this approach can be further simplified? May be there is a way to execute sql-queries in particular order without creation of catalog entries for datasets, for example? Thx in advance for critique and suggestions!
o

Olivier Ho

12/22/2022, 3:24 PM
I had similar issues to you but i'm not from the kedro team so take my answer with a grain of salt. If the sql queries do not change. Would it not be better to create a custom dataset where the _load function returns an iterator as it was defined in the SQLQueryDataSet
👍 1
?
e

Eugene P

12/22/2022, 3:27 PM
Sounds promising, thanks! I’ll give a try
8 Views