I'm having trouble reasoning through a pipeline co...
# questions
q
I'm having trouble reasoning through a pipeline configuration (since I'm not working with ML models but using kedro anyway). My situation is that I'm effectively using kedro as a glue between different simulation tools others have written. Let's say there's two pipelines, A and B: 1. No inputs to A except for parameters 2. One output from A 3. One input for B along with parameters The input to B can be the output from A, or I can use parameters to generate a stand-in input. Seems to me that the right way to approach this is to create a node within pipeline B that's something like "prep_input". How do I include it when pipeline B is run by itself, but not when I want to run A + B?
n
Kedro mostly stick with the static pipeline philosophy, in this sense the node b is a dynamic node because it does different thing when is run alone or together with other nodes. Is it possible to create a separate node instead? If not you may find hooks useful.
l
@quantumtrope Hi I am not from the Kedro team, but what do you think of solving it like this: • Create pipeline A • Create pipeline B ◦ separately it takes data from parameters to create stand-in dataset • Create pipeline B+A ◦ B takes input dataset from A
Copy code
def register_pipelines() -> dict[str, Pipeline]:
    """Register the project's pipelines.

    Returns:
        A mapping from pipeline names to ``Pipeline`` objects.
    """
    namespaces = ["test_run", "baseline"]

    # pipeline A creates a dataset which can be used as input to B
    pipeline_A = pipeline([
        node(func=input_A_to_my_dataset, inputs="params:parameters-for-A", outputs="my_dataset")
    ])

    # you can also create input to B from parameters
    input_from_params_B = pipeline([
        node(func=dataset_from_params, inputs="params:stand-in-dataset", outputs="my_dataset")
    ])

    # given an input, this is the 'core' pipeline for B
    core_B = pipeline([
        node(func=do_B_with_dataset, inputs=["my_dataset", "params:parameters-for-B"] outputs="my_dataset")
    ])

    pipelines: dict[str, Pipeline] = dict(
        A=pipeline_A,
        B=input_from_params_B + core_B,
        B_with_A=pipeline_A + core_B
    )

    pipelines["__default__"] = pipelines["B_with_A"]

    return pipelines
q
@Lodewic van Twillert Thank you! I think that makes a lot of sense. Just create different combinations of pipelines that are already defined. The automatic pipeline inference for
register_pipelines
works so well I forgot that I should just overwrite it.