Afiq Johari
10/25/2023, 10:55 AMnode(func=regenerate, inputs="mydata_sql",outputs="mydata_excel")
def regenerate(mydata):
# run SQL stored procedure that impacts the table of mydata_sql in the database
# reload mydata because the stored procedure will have changed mydata
return mydata # convert it to excel file
Unfortunately, recreating the data transformation of the stored procedure in Python may not be straightforward. That's why I depend on the stored procedure to transform/update my data.Dmitry Sorokin
10/25/2023, 1:07 PMAfiq Johari
10/25/2023, 3:06 PMAfiq Johari
10/25/2023, 3:10 PMNok Lam Chan
10/25/2023, 7:05 PMmydata_sql but you need to make sure the store_proc get executed?Nok Lam Chan
10/25/2023, 7:08 PMregenerate and does it take any parameters?
If not - I think before_dataset_loaded is a good candidate https://docs.kedro.org/en/stable/kedro.framework.hooks.specs.DatasetSpecs.html#kedro.framework.hooks.specs.DatasetSpecs.before_dataset_loaded
If yes - It’s a bit tricky because it’s not pure I/O but it is also not processing logic, the real compute happens in the database and your code only trigger SP and load the data (which is more a responsibility of dataset). You will most likely need some “dummy input/output” instead case to make sure the dependency is correct.Afiq Johari
10/26/2023, 2:40 AMregenerate takes in parameters but these parameters are actually for the stored procedures (SP)
At the moment, there's no plan to migrate the SP to Python, hence why we rely on the SP.
The latest code iteration I have is to reload the data within the node
Added this to nodes.py
credentials = config_loader.get("credentials.yml")
catalogs = config_loader.get("catalog.yml")
thedata= DataCatalog.from_config(catalogs, credentials)
def regenerate(mydata):
# run SP
# reload sql data after SP execution
mydata= thedata.load("mydata_sql")
return mydata