Hi team:blush: I am quite new to kedro and I have ...
# questions
i
Hi team😊 I am quite new to kedro and I have a question regarding the use of multiple namespaces. I would like to have pipelines run: • with two different models • and with three different sets of data so 6 pipelines in total. Right now I have only defined a namespace for the different sets of data similar to the example:
Copy code
ds_pipeline_first_set_of_data = pipeline(
    pipe=pipeline_instance,
    inputs="model_input_table",
    namespace="first_set_of_data",
)
and would like to add a namespace for the model type like this:
Copy code
ds_pipeline_first_set_of_data_second_model = pipeline(
    pipe=pipeline_instance,
    inputs="model_input_table",
    namespace=["first_set_of_data", "second_model"],
)
Would this work and is it considered good practice? Thanks☺️
y
Hi! Two things: 1.
namespace
, if provided, should be a
str
, so a list wouldn't work. Concatenate namespaces using e.g. a
.
2. Below is an example of how to make all of your 2 * 3 = 6 pipelines. The idea is to use a for loop.
Copy code
# Step 1. Define an abstact pipe to be used as template
_modeling_pipe = Pipeline([
    node(
        preprocess_data,
        inputs="input_data",
        outputs="model_input_data",
        name="preprocess_data",
    ),
    node(
        create_model,
        inputs="params:model_parameters",
        outputs="model",
        name="create_model",
    ),
    node(
        evaluate_model,
        inputs={
            "data": "model_input_data",
            "model": "model",
        },
        outputs="model_metrics",
        name="evaluate_model",
    )
])

# Step 2. Define your combinations
_INPUT_DATASETS = (
    "first_set_of_data",
    "second_set_of_data",
    "third_set_of_data",
)
_MODEL_PARAMETER_OPTIONS = (
    "first_model_parameters",
    "second_model_parameters",
)

# Step 3. Create pipelines for each combination, and sum them
individual_pipelines = tuple(
    pipeline(
        pipe=_modeling_pipe,
        inputs={
            "input_data": dataset,
        },
        parameters={
            "params:model_parameters": f"params:{model_parameters}"
        },
        namespace=f"{dataset}.{model_parameters}",
    )
    for dataset in _INPUT_DATASETS
    for model_parameters in _MODEL_PARAMETER_OPTIONS
)
combined_pipeline = sum(individual_pipelines)
i
@Yury Fedotov thank you so much!!! this is exactly what I was looking for🎯❤️
🥳 1