Hello Could someone tell me the difference between specifyin Kedro #questions

Hello! Could someone tell me the difference betwee...

Francis Duval

01/18/2024, 4:37 PM

Hello! Could someone tell me the difference between specifying inputs/outputs inside the

pipeline

function, vs specifying inputs/outputs inside the

node

function? It seems redundant, because these 2 codes seem to yield the same thing:

Copy code

def create_pipeline(**kwargs) -> Pipeline:
    pipe1 = pipeline(
        pipe=[
            node(
                func=mafonc1,
                inputs='params:nombre',
                outputs='result1'
            ),
            node(
                func=mafonc2,
                inputs='result1',
                outputs='result2'
            )
        ],
        namespace='ns1',
        inputs='params:nombre',
        outputs='result2'
    )

    return pipe1

Copy code

def create_pipeline(**kwargs) -> Pipeline:
    pipe1 = pipeline(
        pipe=[
            node(
                func=mafonc1,
                inputs='params:nombre',
                outputs='result1'
            ),
            node(
                func=mafonc2,
                inputs='result1',
                outputs='result2'
            )
        ],
        namespace='ns1'
    )

    return pipe1

K 1

Nok Lam Chan

01/18/2024, 4:55 PM

https://docs.kedro.org/en/stable/nodes_and_pipelines/modular_pipelines.html#using-the-modular-pipeline-wrapper-to-provide-overrides

Nok Lam Chan

01/18/2024, 4:55 PM

TLDR; the one is node is for specifying the

inputs

and

outputs

, the one for

pipeline

is to optionally escape the namespace.

Nok Lam Chan

01/18/2024, 4:56 PM

i.e. whenever you provide the

namespace

argument, all inputs and outputs will be read as

<http://namespace.xxx|namespace.xxx>

instead of

, in some case you do want it to read the non-namespace version, and you need to provide the name of that in the arguments

Nok Lam Chan

01/18/2024, 4:57 PM

Tips: try to print the pipeline python object, you should see the difference

Nok Lam Chan

01/18/2024, 4:58 PM

if you have this in a kedro project, you can do

kedro ipython

then just print the

pipelines

object out

Francis Duval

01/18/2024, 6:11 PM

Thank you, this is clear! So inputs/outputs for

pipeline

are optional, whereas inputs/outputs for

node

are mandatory

Nok Lam Chan

01/18/2024, 6:12 PM

Yes - you only need to provide it for

pipeline

if you need to escape from the namespace

Nok Lam Chan

01/18/2024, 6:14 PM

The function signature should have reflected this - if not it's something that we should fix :)

Francis Duval

01/18/2024, 6:17 PM

Oh yes, I just saw this in modular_pipeline.py:

Copy code

inputs: A name or collection of input names to be exposed as connection points
    to other pipelines upstream. This is optional; if not provided, the
    pipeline inputs are automatically inferred from the pipeline structure.
    When str or set[str] is provided, the listed input names will stay
    the same as they are named in the provided pipeline.
    When dict[str, str] is provided, current input names will be
    mapped to new names.
    Must only refer to the pipeline's free inputs.

👍🏼 1

2 Views

Open in Slack

Previous Next