Hi, Currently I built a modular pipeline that is b...
# questions
s
Hi, Currently I built a modular pipeline that is being reuse for different inputs. I was wondering whether in the same pipeline I need to add some fixed nodes at the end if it is possible? So I need to add one node to the end of data_pipeline1 and another different to data_pipeline2, should I create another different pipeline to couple with the outputs of these? Or can I add some fixed nodes in this same file?
Copy code
from kedro.pipeline.modular_pipeline import pipeline

def create_pipeline(**kwargs) -> Pipeline:
    template_pipeline = pipeline(
        [
            node(
                func=preprocess_json,
                inputs="input1",
                outputs="intermediary_output",
            ),
            node(
                func=expand_column_df,
                inputs=["intermediary_output", "params:override_me"],
                outputs="output",
            ),
        ]
    )
    
    data_pipeline1 = pipeline(
        pipe=template_pipeline,
        inputs={"input1":"df1"},
        parameters={"params:override_me": "params:df1_param"},
        outputs={"output":"df1_pos"},
        namespace="df1_pos",
    )

    data_pipeline2 = pipeline(
        pipe=template_pipeline,
        inputs={"input1":"df2"},
        parameters={"params:override_me": "params:df2_param"},
        outputs={"output":"df2_pos"},
        namespace="df2_pos",
    )

    final_pipeline = data_pipeline1 + data_pipeline2 
    return final_pipeline
n
What do you mean by fix node?
So i am guessing you have a few pipelines which are parallel branch and you have a final node to connect to these branches?
s
Assume that I want to add a new node at the end of each pipeline. This node can not belong to the modular pipeline since is going to have different functionality.
I just want to connect a new node called node_1 at the end of data_pipeline1 and a second node called node_2 at the end of data_pipeline2. These two nodes are different in functionality, that’s why I didn’t include in the template_pipeline. To do these operation what is the best approach?
n
That's fine, your pipeline doesn't have to be a pure modular pipeline
Your template pipeline is fine here, then for your actual pipeline it would be equal to your modular pipeline + an extra node.
Take your example, you will end up having 3 pipelines. Take
p
as pipeline and
m
as modular pipeline. p1 = m1 + m1_fix_node p2 = m2 + m2_fix_node p_all = p1 + p2
s
it makes sense. To create m1_fix_node and m2_fix_node, can i just create the node and added to the pipeline? add operation work between nodes and pipeline?
n
A modular pipeline is essentially a “high-level” node. Conceptually it’s something like this.
Yes, nodes and pipeline can be
+
or even
-
(more surprisingly)
under the hood it all happens at a “node” level
s
Wow 😃
Thanks @Nok Lam Chan
n
I mean, you can’t directly add a node to a pipeline, but you could wrap it as a single node pipeline. i.e.
pipeline([node])
s
Ok, just to recap, I create this two new nodes, wrap it into a a pipeline, and just added like you mentioned before
n
Correct. If you look at the starter. For example you can do
kedro new -s spaceflights
, you will find that in
pipeline_registry.py
there is a
sum(pipelines)
operation. https://github.com/kedro-org/kedro-starters/blob/8ac843863c64b98df17bdefed87ec0569[…]%7B%20cookiecutter.python_package%20%7D%7D/pipeline_registry.py
🙌 1