Hello kedroids! :kedro: I am following this artic...
# questions
a
Hello kedroids! K I am following this article to implement dynamic pipelines: https://getindata.com/blog/kedro-dynamic-pipelines/ Is it possible to have dynamic pipelines without defining the distinct values in the code? It would be great if there is a way to implement this with dynamic values defined in config!
i
I’ll let the author (@marrrcin) give the conclusive answer, but by default the parameters are not available when the pipelines are being registered, which is why in the article they explicitly define the values. We’ve worked around that by manually saving the parameters using an
after_context_created
hook which is run before register_pipelines (in most cases). I think in the article this is considered “not the true kedro way”, which is why I’ll let Marcin give his view, but there is definitely ways to do that, although it might go against the recommended way to use kedro.
m
Couldn’t agree more @Iñigo Hidalgo plusone @Abhishek Bhatia the whole purpose of doing this in a way described in the article is to have it “dynamic” while still preserving reproducibility. It’s the same discussion every time, so you can lookup dynamic pielines in this Slack’s history to get more context.
I mean - it’s directly there
a
Got it! If the unique values are known beforehand, then totally agree that it's a super elegant way of achieving dynamic pipelines 👍 @Iñigo Hidalgo Could you elaborate a bit, as to how saving the parameters after the context is created helps you dynamically generate dynamic pipelines? Do you read the parameters -> fetch dynamic values -> fetch pipelines and generate many dynamic pipelines (namespaced / tagged)?
i
Our usecase for dynamic pipelines has a very narrow scope.
We want to run multiple models in parallel to predict a single target, basically a live scoreboard for various models The way we build additional models is by adding a new subconfig under a certain "signal"'s config, so adding a new model under a specific config key means we need to build a new pipeline. From the pipeline registry we read the parameters and iterate over all the models and add a new pipeline for each key, and pass the inner config to the train-predict pipelines. This is okay for us as we do not really care about repeatability, and it is a very limited scope within one project. I think it would very quickly get out of hand if we were doing more dynamic stuff, but since the config for each additional model 1. lives in the same place and 2. has the same format, it is complexity which we have under control.
👍 1
in this context, in parallel means multiple models running in production alongside each other, not that their execution happens in parallel
a
Thanks! Makes sense. Sometimes it's also better to handle dynamic nature inside the nodes rather than dynamically generating many pipelines