Is it possible to reload a data within a function node For e Kedro #questions

Is it possible to reload a data within a function/...

Afiq Johari

10/25/2023, 10:55 AM

Is it possible to reload a data within a function/node? For example

Copy code

node(func=regenerate, inputs="mydata_sql",outputs="mydata_excel")

def regenerate(mydata):
     # run SQL stored procedure that impacts the table of mydata_sql in the database
     # reload mydata because the stored procedure will have changed mydata
     return mydata # convert it to excel file

Unfortunately, recreating the data transformation of the stored procedure in Python may not be straightforward. That's why I depend on the stored procedure to transform/update my data.

Dmitry Sorokin

10/25/2023, 1:07 PM

Hi Afiq, I think that reloading data within a node isn't a best practice in Kedro. However, have you considered achieving this by dividing the process into multiple nodes step by step?

Afiq Johari

10/25/2023, 3:06 PM

@Dmitry Sorokin Basically, what I want to do with Kedro is the ability to update my SQL tables by executing some stored procedures. So the expected output from these stored procedures would be the updated SQL tables. Once the SQL tables are updated, I want to output them as Excel files. I think the main challenge I have is to create a pipeline with the right inputs and outputs so that when I execute the pipeline, it will always start with executing the stored procedures, return the SQL table, and then output the SQL table as Excel.

Afiq Johari

10/25/2023, 3:10 PM

The node that takes in SQL table and outputs Excel file is fine. It's pretty straightforward. But I can't seem to properly create a node that takes in a set of parameters (related to Stored Procedures) and return a SQL table.

Nok Lam Chan

10/25/2023, 7:05 PM

So if I understand correctly - you want to load

mydata_sql

but you need to make sure the store_proc get executed?

Nok Lam Chan

10/25/2023, 7:08 PM

Do you have processing logic inside

regenerate

and does it take any

parameters

? If not - I think

before_dataset_loaded

is a good candidate https://docs.kedro.org/en/stable/kedro.framework.hooks.specs.DatasetSpecs.html#kedro.framework.hooks.specs.DatasetSpecs.before_dataset_loaded If yes - It’s a bit tricky because it’s not pure I/O but it is also not processing logic, the real compute happens in the database and your code only trigger SP and load the data (which is more a responsibility of dataset). You will most likely need some “dummy input/output” instead case to make sure the dependency is correct.

Afiq Johari

10/26/2023, 2:40 AM

@Nok Lam Chan the

regenerate

takes in

parameters

but these

parameters

are actually for the

stored procedures

(SP) At the moment, there's no plan to migrate the SP to Python, hence why we rely on the SP. The latest code iteration I have is to reload the data within the node Added this to

nodes.py

Copy code

credentials = config_loader.get("credentials.yml")
catalogs = config_loader.get("catalog.yml")
thedata= DataCatalog.from_config(catalogs, credentials)

Copy code

def regenerate(mydata):
  # run SP
  # reload sql data after SP execution
  mydata= thedata.load("mydata_sql")
return mydata

4 Views

Open in Slack

Previous Next