Hello team, I have a modular pipeline which I want...
# questions
s
Hello team, I have a modular pipeline which I want to run for n number of times. n is not constant and the value can fluctuate. I then need outputs of all of these runs as a list. This list is an input to one of the node of another modular pipeline . Any idea how can I proceed: What I have so far: I am running a loop n times and bundling the pipelines together.
full_pipe = sum([pipeline_1 + pipeline_2 + ...  ])
Problem with this is: • how do I add dynamic catalog entries for the outputs of each of these runs? • How do I aggregate all the outputs so that it can be passed as the input to another modular pipeline. What is the best way to approach this ? Thank You!
m
• how do I add dynamic catalog entries for the outputs of each of these runs? Use dataset factories - https://docs.kedro.org/en/stable/data/kedro_dataset_factories.html • How do I aggregate all the outputs so that it can be passed as the input to another modular pipeline. If you know
N
, then it should not be a problem - just create next pipeline with a node with
N
inputs and map them from your modular outputs.
s
@marrrcin I won't know N , it will be different for every run. Any workaround for the aggregation part?
Also this factory dataset is supported only after 0.18.12, Is there an alternative from 0.18.6 version?
m
Alternative is to upgrade 😄 How do you plan to set
N
then?