Hi friends I m looking for some advice I need to be able to Kedro #questions

Hi friends, I’m looking for some advice. I need t...

Jordan

10/21/2022, 8:49 PM

Hi friends, I’m looking for some advice. I need to be able to batch process different custom partitioned datasets using the same modular pipeline, whenever required. It’s quite tedious to make

catalog.yml

entries for the inputs and outputs of each batch process. Therefore, I was hoping to implement a solution using hooks that would avoid this tedium: If possible, I would like the solution to: 1. Dynamically populate the catalog with input and output entires for each partitioned dataset. 2. Instantiate and run the modular pipeline using each partitioned dataset’s dynamically populated catalog entries. 3. Make the output datasets of each run available via the data catalog at any time. This should (maybe) be possible with some combination of

after_context_created

after_catalog_created

and

before_pipeline_run

hooks, but unsure how to actually implement this. Any guidance would be much appreciated, cheers.

Deepyaman Datta

10/25/2022, 1:51 PM

Sorry for the late response. On point 3, does that mean you want to actually update the catalog.yml file dynamically so that it can be referred to in the future? It should be fine to modify the catalog, let's say

after_catalog_created

(I wrote a plugin called Kedro-Accelerator while back that does catalog modification), and you can even add a step to overwrite the existing catalog file if you want.

K 1

4 Views

Open in Slack

Previous Next