Hey folks what s the best approach for dynamically generatin Kedro #questions

Hey folks, what's the best approach for dynamicall...

Amos

10/03/2023, 6:58 AM

Hey folks, what's the best approach for dynamically generating, slicing and dicing modular pipelines to make them scalable and apply different chunks to different inputs? For example, say I have some tabular data that requires a reasonably vanilla set of Pandas-type wrangling and cleaning operations. Say some of the data is already joined and some is not — so some groups of inputs require joining or concatenation and some do not — and some require further feature-generation calculations. Assume the number of groups is small but variable. I guess one could parameterise the inputs and use code to generate the nodes accordingly, but it all gets rather involved rather quickly. Would a better approach perhaps be to use

kedro.pipeline.Pipeline.filter

to chop up a template and then

kedro.pipeline.modular_pipeline.pipeline

to override the inputs/parameters etc.? Would be keen for any pointers to well-written, dynamically-generated and scalable kedro pipelines. I'm new to the framework but not new to Python. Thank you!

marrrcin

10/03/2023, 8:27 AM

We're going to release a blog post on that soon, if you're interested

marrrcin

10/03/2023, 8:27 AM

In a week or so

Amos

10/03/2023, 11:20 AM

Yep, any tips would be welcome. Happy to read a draft if it would be helpful.

John Bang

11/16/2023, 5:06 PM

I'd be interested to read that post when released!

marrrcin

11/16/2023, 5:37 PM

It was already published https://getindata.com/blog/kedro-dynamic-pipelines/

👍 1

John Bang

11/16/2023, 7:18 PM

thanks for the quick reply and great article. sent u some claps on your linkedin post of this.

🎉 1

Open in Slack

Previous Next