Jordan
10/21/2022, 8:49 PMcatalog.yml
entries for the inputs and outputs of each batch process.
Therefore, I was hoping to implement a solution using hooks that would avoid this tedium:
If possible, I would like the solution to:
1. Dynamically populate the catalog with input and output entires for each partitioned dataset.
2. Instantiate and run the modular pipeline using each partitioned dataset’s dynamically populated catalog entries.
3. Make the output datasets of each run available via the data catalog at any time.
This should (maybe) be possible with some combination of after_context_created
, after_catalog_created
and before_pipeline_run
hooks, but unsure how to actually implement this.
Any guidance would be much appreciated, cheers.Deepyaman Datta
10/25/2022, 1:51 PMafter_catalog_created
(I wrote a plugin called Kedro-Accelerator while back that does catalog modification), and you can even add a step to overwrite the existing catalog file if you want.