Hey folks, what's the best approach for dynamicall...
# questions
a
Hey folks, what's the best approach for dynamically generating, slicing and dicing modular pipelines to make them scalable and apply different chunks to different inputs? For example, say I have some tabular data that requires a reasonably vanilla set of Pandas-type wrangling and cleaning operations. Say some of the data is already joined and some is not — so some groups of inputs require joining or concatenation and some do not — and some require further feature-generation calculations. Assume the number of groups is small but variable. I guess one could parameterise the inputs and use code to generate the nodes accordingly, but it all gets rather involved rather quickly. Would a better approach perhaps be to use
kedro.pipeline.Pipeline.filter
to chop up a template and then
kedro.pipeline.modular_pipeline.pipeline
to override the inputs/parameters etc.? Would be keen for any pointers to well-written, dynamically-generated and scalable kedro pipelines. I'm new to the framework but not new to Python. Thank you!
m
We're going to release a blog post on that soon, if you're interested
In a week or so
a
Yep, any tips would be welcome. Happy to read a draft if it would be helpful.
j
I'd be interested to read that post when released!
m
j
thanks for the quick reply and great article. sent u some claps on your linkedin post of this.
🎉 1