A quick clarification on registering pipelines When I instal Kedro #questions

A quick clarification on registering pipelines. Wh...

Emilio Gagliardi

07/03/2023, 6:39 PM

A quick clarification on registering pipelines. When I install the spaceflights demo, the register_pipelines.py file contains the following:

Copy code

def register_pipelines() -> Dict[str, Pipeline]:
    """Register the project's pipelines.

    Returns:
        A mapping from pipeline names to ``Pipeline`` objects.
    """
    pipelines = find_pipelines()
    pipelines["__default__"] = sum(pipelines.values())
    return pipelines

However, in the spaceflights tutorial videos I'm watching, the host doesn't use the above code. instead they add the following:

Copy code

data_processing_pipeline = dp.create_pipeline()
return{"__default__": data_processing_pipeline,
    "dp":data_processing_pipeline}

So I'm unclear what I'm supposed to do for my own project. Do I just use the sum(pipelines.values()) or do I manually add pipelines as in the second block? THanks kindly,

Yetunde

07/03/2023, 9:15 PM

Hey @Emilio Gagliardi 😄 Thanks for your fantastic question. Let me explain this one and also ask you a question. What you're seeing is a usability difference between Kedro versions. In previous versions of Kedro, we used to make users list out of their pipelines. We made the change to:

Copy code

def register_pipelines() -> Dict[str, Pipeline]:
    """Register the project's pipelines.

    Returns:
        A mapping from pipeline names to ``Pipeline`` objects.
    """
    pipelines = find_pipelines()
    pipelines["__default__"] = sum(pipelines.values())
    return pipelines

To ensure that users didn't have to do this but kept the file so that users can make their own combinations of pipelines if they want to by editing the file, the code you see in:

Copy code

data_processing_pipeline = dp.create_pipeline()
return{"__default__": data_processing_pipeline,
    "dp":data_processing_pipeline}

Works as well. My question is, which video are you watching? We need to update this.

Emilio Gagliardi

07/03/2023, 9:52 PM

thanks very much @Yetunde I think I understand. If I leave the code as-is, then the default pipeline is always ALL of your registered pipelines. but if I want to access different combinations of individual pipelines then I need to use the manual method. Can you show me what it would like to use the current method and the manual method together? Or is it one or the other? I was referencing these tutorials:

QuantumBlack▾

DataTalksClub

Deepyaman Datta

07/03/2023, 10:14 PM

@Emilio Gagliardi you can manually register other combinations of pipelines like so:

Copy code

def register_pipelines() -> Dict[str, Pipeline]:
    """Register the project's pipelines.

    Returns:
        A mapping from pipeline names to ``Pipeline`` objects.
    """
    pipelines = find_pipelines()
    pipelines["__default__"] = sum(pipelines.values())
    pipelines["no_modeling_for_me"] = pipelines["de"] + pipelines["dp"]  # assume de and dp modular pipelines exist
    return pipelines

👍 1

Emilio Gagliardi

07/03/2023, 10:51 PM

thank you kindly @Deepyaman Datta that is helpful.

👌 1

8 Views

Open in Slack

Previous Next