https://kedro.org/ logo
#questions
Title
# questions
a

Ana Man

03/14/2023, 4:20 PM
Hi everyone! I have a scenario and i wanted to see how people resolve this in their projects: Lets say you have a modular pipeline package that has a pipeline with 9 nodes (called pipe1). you want to amend the functionality of this pipeline to accommodate two conditions. Condition 1 relies on the pipeline as it is. Condition 2 requires a small change : an addition of 2 nodes in the pipeline. What would be the best practice way to extend this pipeline (ensuring backward compatibility)?
Was thinking of creating a separate pipeline package (a pipe2 and pipe1) to deal with the two conditions separately but they would have a lot of the same logic in both so unsure about that solution. I also thinking of possibly writing two pipelines in the same package (one amended, one not) and putting them into a dict (pipeline_name: pipeline) and then selecting the pipeline i need in my registry with
create_pipeline(run_pipe="pipe1")
(implementing logic to select from dict in create_pipeline). Unsure how people solve this issue of dealing with small variances in their pipeline. Hope that makes sense
n

Nok Lam Chan

03/14/2023, 4:31 PM
In this case wouldn’t it be just summing the pipeline? Equivalent to
__default__: de + ds
in the starters
a

Ana Man

03/14/2023, 4:42 PM
unsure how this solves the problem
if you sum the pipeline you would be executing pipe1 + pipe2, where pipe1 = n1 + n2 +n3 + n4 + n5 + n6 + n7 + n8 + n9 and pipe 2 = n1 + n2 +n3 + n4 + n5 + n6 + n7 + n8 + n9 + n10 + n11 (for example)
that doesnt help in this situation as i want to slightly amend pipe1 to work with both 9 and 11 nodes (condition 1 and condition2) were the logically is slightly different for these conditions but use a lot of the same core nodes
n

Nok Lam Chan

03/14/2023, 5:12 PM
Can’t you have a subpipeline which is just n10 + n11 and pipe2 = pipe1 + subpipeline?
a

Ana Man

03/14/2023, 5:22 PM
Yes that makes sense!
but also what about in the scenario of the following:
pipe1 = n1 + n2 +n3(depends on output of n1, n2) + n4 + n5 + n6
pipe 2 = pipe1 = n1 + n2 +n3(modified - depends on output of n1, n2) + n4 + n5 + n6 + n7
what would you do in this situation? where you require a lot of similar logic and one of the nodes to in the pipeline needs to be modified e.g (extra input)
I have found a solution to this issue. Using tags on the nodes allows me to filter the nodes so the following will work without having to duplicate logic: simpler example: pipe1 = n1(tag=core) + n2(tag=core) +n3(depends on output of n1, n2) + n4(tag=core) + n5(tag=core) + n6(tag=core) pipe2 = pipe1.only_nodes_with_tags("core") + n3(modified - depends on output of n1, n2) + n7 that way i can seperate my pipeline packages cleaning and scale the solution!