Hi everyone I have a scenario and i wanted to see how people Kedro #questions

Hi everyone! I have a scenario and i wanted to see...

Ana Man

03/14/2023, 4:20 PM

Hi everyone! I have a scenario and i wanted to see how people resolve this in their projects: Lets say you have a modular pipeline package that has a pipeline with 9 nodes (called pipe1). you want to amend the functionality of this pipeline to accommodate two conditions. Condition 1 relies on the pipeline as it is. Condition 2 requires a small change : an addition of 2 nodes in the pipeline. What would be the best practice way to extend this pipeline (ensuring backward compatibility)?

Ana Man

03/14/2023, 4:20 PM

Was thinking of creating a separate pipeline package (a pipe2 and pipe1) to deal with the two conditions separately but they would have a lot of the same logic in both so unsure about that solution. I also thinking of possibly writing two pipelines in the same package (one amended, one not) and putting them into a dict (pipeline_name: pipeline) and then selecting the pipeline i need in my registry with

create_pipeline(run_pipe="pipe1")

(implementing logic to select from dict in create_pipeline). Unsure how people solve this issue of dealing with small variances in their pipeline. Hope that makes sense

Nok Lam Chan

03/14/2023, 4:31 PM

In this case wouldn’t it be just summing the pipeline? Equivalent to

__default__: de + ds

in the starters

Ana Man

03/14/2023, 4:42 PM

unsure how this solves the problem

Ana Man

03/14/2023, 4:43 PM

if you sum the pipeline you would be executing pipe1 + pipe2, where pipe1 = n1 + n2 +n3 + n4 + n5 + n6 + n7 + n8 + n9 and pipe 2 = n1 + n2 +n3 + n4 + n5 + n6 + n7 + n8 + n9 + n10 + n11 (for example)

Ana Man

03/14/2023, 4:46 PM

that doesnt help in this situation as i want to slightly amend pipe1 to work with both 9 and 11 nodes (condition 1 and condition2) were the logically is slightly different for these conditions but use a lot of the same core nodes

Nok Lam Chan

03/14/2023, 5:12 PM

Can’t you have a subpipeline which is just n10 + n11 and pipe2 = pipe1 + subpipeline?

Ana Man

03/14/2023, 5:22 PM

Yes that makes sense!

Ana Man

03/14/2023, 5:23 PM

but also what about in the scenario of the following:

Ana Man

03/14/2023, 5:23 PM

pipe1 = n1 + n2 +n3(depends on output of n1, n2) + n4 + n5 + n6

Ana Man

03/14/2023, 5:23 PM

pipe 2 = pipe1 = n1 + n2 +n3(modified - depends on output of n1, n2) + n4 + n5 + n6 + n7

Ana Man

03/14/2023, 5:24 PM

what would you do in this situation? where you require a lot of similar logic and one of the nodes to in the pipeline needs to be modified e.g (extra input)

Ana Man

03/15/2023, 10:23 AM

I have found a solution to this issue. Using tags on the nodes allows me to filter the nodes so the following will work without having to duplicate logic: simpler example: pipe1 = n1(tag=core) + n2(tag=core) +n3(depends on output of n1, n2) + n4(tag=core) + n5(tag=core) + n6(tag=core) pipe2 = pipe1.only_nodes_with_tags("core") + n3(modified - depends on output of n1, n2) + n7 that way i can seperate my pipeline packages cleaning and scale the solution!

3 Views

Open in Slack

Previous Next