Hi friends, I’m looking for some advice. I need t...
# questions
j
Hi friends, I’m looking for some advice. I need to be able to batch process different custom partitioned datasets using the same modular pipeline, whenever required. It’s quite tedious to make
catalog.yml
entries for the inputs and outputs of each batch process. Therefore, I was hoping to implement a solution using hooks that would avoid this tedium: If possible, I would like the solution to: 1. Dynamically populate the catalog with input and output entires for each partitioned dataset. 2. Instantiate and run the modular pipeline using each partitioned dataset’s dynamically populated catalog entries. 3. Make the output datasets of each run available via the data catalog at any time. This should (maybe) be possible with some combination of
after_context_created
,
after_catalog_created
and
before_pipeline_run
hooks, but unsure how to actually implement this. Any guidance would be much appreciated, cheers.
d
Sorry for the late response. On point 3, does that mean you want to actually update the catalog.yml file dynamically so that it can be referred to in the future? It should be fine to modify the catalog, let's say
after_catalog_created
(I wrote a plugin called Kedro-Accelerator while back that does catalog modification), and you can even add a step to overwrite the existing catalog file if you want.
K 1