Hey everyone, I'm have a modular pipeline that I'm running with the only difference being a small parameter change but the dataset is a really large dataset and the issue I'm running into is Kedro is running the first node of each modular pipeline causing a OOM issue, is there a way to make it finish running a modular pipeline before beginning the next?
05/11/2023, 8:28 AM
Maybe somebody from the Kedro team will have more ideas, but generally pipeline execution is meant to be unordered (I think there's a better word for this, basically the order from one run to the next need not be the same, the only thing that will always be the case is that nodes which generate an output will run before nodes which have that output as an input). The workaround I've seen before is basically to force this order of execution by having an artificial output of one pipeline be an input of another, but I think it's generally discouraged.
The other simpler option would be simply to run the two pipelines on their own sequentially.
Nok Lam Chan
05/11/2023, 11:10 AM
Thanks, @Iñigo Hidalgo, you are right that it’s generally unordered, if you are using SequentialRunner to run the same set of pipelines, the order will be the same (changes that were added last year).
I don’t have a better answer top of my mind, dummy input/output if you absolutely need to control the order.
It’s a fair requirement for OOM issue.