Afiq Johari
10/25/2023, 10:55 AMnode(func=regenerate, inputs="mydata_sql",outputs="mydata_excel")
def regenerate(mydata):
# run SQL stored procedure that impacts the table of mydata_sql in the database
# reload mydata because the stored procedure will have changed mydata
return mydata # convert it to excel file
Unfortunately, recreating the data transformation of the stored procedure in Python may not be straightforward. That's why I depend on the stored procedure to transform/update my data.Dmitry Sorokin
10/25/2023, 1:07 PMAfiq Johari
10/25/2023, 3:06 PMNok Lam Chan
10/25/2023, 7:05 PMmydata_sql
but you need to make sure the store_proc get executed?regenerate
and does it take any parameters
?
If not - I think before_dataset_loaded
is a good candidate https://docs.kedro.org/en/stable/kedro.framework.hooks.specs.DatasetSpecs.html#kedro.framework.hooks.specs.DatasetSpecs.before_dataset_loaded
If yes - It’s a bit tricky because it’s not pure I/O but it is also not processing logic, the real compute happens in the database and your code only trigger SP and load the data (which is more a responsibility of dataset). You will most likely need some “dummy input/output” instead case to make sure the dependency is correct.Afiq Johari
10/26/2023, 2:40 AMregenerate
takes in parameters
but these parameters
are actually for the stored procedures
(SP)
At the moment, there's no plan to migrate the SP to Python, hence why we rely on the SP.
The latest code iteration I have is to reload the data within the node
Added this to nodes.py
credentials = config_loader.get("credentials.yml")
catalogs = config_loader.get("catalog.yml")
thedata= DataCatalog.from_config(catalogs, credentials)
def regenerate(mydata):
# run SP
# reload sql data after SP execution
mydata= thedata.load("mydata_sql")
return mydata