Eugene P12/22/2022, 2:52 PM
. One for each query. 3. I have generic node function to call SQL query, returning empty df like this
4. I define required nodes controlling the execution order by using consecutive empty_df outputs/inputs
def run_sql_script_node(sql_query_dataset: pd.DataFrame, blank_df_for_nodes_order: pd.DataFrame,): return pd.DataFrame()
I do understand that Kedro may be the not-the-100%-appropriate-tool to control SQL workflows, but for the sake of total DS pipeline integrity and my kedro-learning would like to stick to it (it is amazing, btw!). This workaround works and works correctly, but I was thinking that this approach can be further simplified? May be there is a way to execute sql-queries in particular order without creation of catalog entries for datasets, for example? Thx in advance for critique and suggestions!
node( func=run_sql_script_node, inputs=["create_rropen_cadcost_schema_and_tables_dataset", "empty_cadcost_df0"], outputs="empty_cadcost_df1", name="create_rropen_cadcost_schema_and_tables_node", ), node( func=run_sql_script_node, inputs=["create_rropen_cadcost_staging_table_dataset", "empty_cadcost_df1"], outputs="empty_cadcost_df2", name="create_rropen_cadcost_staging_table_dataset_node", ),
Olivier Ho12/22/2022, 3:24 PM
Eugene P12/22/2022, 3:27 PM