https://kedro.org/ logo
#questions
Title
# questions
a

Amos

10/03/2023, 6:58 AM
Hey folks, what's the best approach for dynamically generating, slicing and dicing modular pipelines to make them scalable and apply different chunks to different inputs? For example, say I have some tabular data that requires a reasonably vanilla set of Pandas-type wrangling and cleaning operations. Say some of the data is already joined and some is not — so some groups of inputs require joining or concatenation and some do not — and some require further feature-generation calculations. Assume the number of groups is small but variable. I guess one could parameterise the inputs and use code to generate the nodes accordingly, but it all gets rather involved rather quickly. Would a better approach perhaps be to use
kedro.pipeline.Pipeline.filter
to chop up a template and then
kedro.pipeline.modular_pipeline.pipeline
to override the inputs/parameters etc.? Would be keen for any pointers to well-written, dynamically-generated and scalable kedro pipelines. I'm new to the framework but not new to Python. Thank you!
m

marrrcin

10/03/2023, 8:27 AM
We're going to release a blog post on that soon, if you're interested
In a week or so
a

Amos

10/03/2023, 11:20 AM
Yep, any tips would be welcome. Happy to read a draft if it would be helpful.
j

John Bang

11/16/2023, 5:06 PM
I'd be interested to read that post when released!
m

marrrcin

11/16/2023, 5:37 PM
j

John Bang

11/16/2023, 7:18 PM
thanks for the quick reply and great article. sent u some claps on your linkedin post of this.
🎉 1