Hi, I'm exploring resources similar to <https://ge...
# questions
a
Hi, I'm exploring resources similar to https://getindata.com/blog/kedro-dynamic-pipelines/ for insights on generating dynamic pipelines. The linked blogpost covers scenarios where we need to accommodate multiple experiments involving varying features, model parameters, and model types. However, I'm also interested in another use case. Suppose I want to conduct experiments that require additional nodes or pipelines. For instance, if the namespace is A, I'd like to trigger extra nodes or pipelines alongside the standard ones that all experiments follow. This means generating additional inputs/outputs specifically for namespace A. Are there any resources discussing this scenario?
d
So dynamic pipelines are an interesting and often repeated conversation as they cloud Kedro’s commitment to reproducibility. Important there is a distinction between: 1. Dynamic configuration, static pipeline structure 2. Static configuration, dynamic pipeline structure 3. Both Dynamic The GetInData addresses option 1 which we endorse because it’s much easier to debug and grok what’s going on. The minute you introduce 2 and 3 the combinatorial complexity you introduce massively increases. This is why we’ve resisted the temptation to introduce a conditional operator in Kedro. There are ways to achieve 2 and 3 but the friction your feeling is intentional and there aren’t really any ways we as a developer team officially endorse. Personally speaking, I think this is a case where it’s better to duplicate code into isolated approaches than build an increasingly complicated monolith that the next person (or yourself in 12 months time) can’t really understand.
m
Seems to me like you @Afiq Johari need a separate pipeline - since you can generate them programatically you can just generate multiple pipelines and add/remove the specific nodes you like during pipeline generation and then just go
kedro run --pipeline=with_extra_nodes
or
kedro run --pipeline=without_extra_nodes
etc.
a
@datajoely @marrrcin I'm wondering if there are any case studies that show how people handle dynamic pipelines. It seems like a topic that comes up a lot, so I imagine there might be different ways to deal with it. In my case, the big challenge is putting different models into the pipeline. Even though they all start with the same
model input
, they each need different data transformation before each model can be called. For example, Model A, Model B, and Model C all need different kinds of changes to the data before they can be trained. One of the reasons is they come from different libraries. Model A may just need the model_input, and it can straight away use the model_input to start training. Model B, C however, may require additional data transformation on the model_input before it can start training. And likely, Model B and Model C have different kinds of data transformations specific to them. But when it comes to what they give us as output, they're all the same since the
model_output
will have the same format So I'm thinking I might need some dynamic pipelines in between the static pipelines. In the future, if we want to add new models, we'll want them to follow the same rules. They should all start with the same
model_input
and give us the same
model_output
. But they should have the freedom to do some extra transformations with the
model_input
before they start training.