What would be the best way to reuse a part of a pipeline like a hundred times with different input parameters.
• Pipeline parses a list of available datasets (web links) - Node A
• Pipeline filters the list based on some fixed criteria to determine the final list of datasets to load and process - Node B
• For each link in the list:
◦ Load dataset - Node C1
◦ Clean dataset - Node C2
◦ Write processed dataset to SQL server- Node C3
• Provide summary on loaded data - Node D
I understand that nodes C1-3 can be organized as a modular pipeline and then reused manually with different input parameters.
The problem is the list of links is dynamic and I need to reuse this modular pipeline C1-3 in a loop depending on the result of node B.
Is there a proper way to do the above using Kedro? Is this even supported because the root pipeline is no longer a DAG if we include all nodes from C1-3?
04/07/2023, 4:49 PM
So if you search this channel you can find the ways people do dynamic pipelines in Kedro. The short answer is that these aren’t natively supported and you have to achieve them through Hooks and Jinja.
It’s very much doable, but since Kedro is about reproducibility there is some intentional friction when trying to do this sort of thing.