I m having trouble reasoning through a pipeline configuratio Kedro #questions

I'm having trouble reasoning through a pipeline co...

quantumtrope

08/23/2023, 8:31 PM

I'm having trouble reasoning through a pipeline configuration (since I'm not working with ML models but using kedro anyway). My situation is that I'm effectively using kedro as a glue between different simulation tools others have written. Let's say there's two pipelines, A and B: 1. No inputs to A except for parameters 2. One output from A 3. One input for B along with parameters The input to B can be the output from A, or I can use parameters to generate a stand-in input. Seems to me that the right way to approach this is to create a node within pipeline B that's something like "prep_input". How do I include it when pipeline B is run by itself, but not when I want to run A + B?

Nok Lam Chan

08/23/2023, 9:02 PM

Kedro mostly stick with the static pipeline philosophy, in this sense the node b is a dynamic node because it does different thing when is run alone or together with other nodes. Is it possible to create a separate node instead? If not you may find hooks useful.

Lodewic van Twillert

08/24/2023, 7:35 AM

@quantumtrope Hi I am not from the Kedro team, but what do you think of solving it like this: • Create pipeline A • Create pipeline B ◦ separately it takes data from parameters to create stand-in dataset • Create pipeline B+A ◦ B takes input dataset from A

Copy code

def register_pipelines() -> dict[str, Pipeline]:
    """Register the project's pipelines.

    Returns:
        A mapping from pipeline names to ``Pipeline`` objects.
    """
    namespaces = ["test_run", "baseline"]

    # pipeline A creates a dataset which can be used as input to B
    pipeline_A = pipeline([
        node(func=input_A_to_my_dataset, inputs="params:parameters-for-A", outputs="my_dataset")
    ])

    # you can also create input to B from parameters
    input_from_params_B = pipeline([
        node(func=dataset_from_params, inputs="params:stand-in-dataset", outputs="my_dataset")
    ])

    # given an input, this is the 'core' pipeline for B
    core_B = pipeline([
        node(func=do_B_with_dataset, inputs=["my_dataset", "params:parameters-for-B"] outputs="my_dataset")
    ])

    pipelines: dict[str, Pipeline] = dict(
        A=pipeline_A,
        B=input_from_params_B + core_B,
        B_with_A=pipeline_A + core_B
    )

    pipelines["__default__"] = pipelines["B_with_A"]

    return pipelines

quantumtrope

08/24/2023, 2:59 PM

@Lodewic van Twillert Thank you! I think that makes a lot of sense. Just create different combinations of pipelines that are already defined. The automatic pipeline inference for

register_pipelines

works so well I forgot that I should just overwrite it.

2 Views

Open in Slack

Previous Next