Shah
10/09/2025, 4:47 PMShah
10/09/2025, 4:48 PMShah
10/09/2025, 4:53 PMdemo_modelling_pipeline receives the presence_motion_co2_combined_cleaned_neg_removed and applies split_data_node and the training and evaluation of LinearRegression. After this, in the extended_modelling_pipeline, these split dataframes (X_train, X_test, y_train, y_test) are passed as inputs, and two nodes execute training and evaluation of RandomForestRegressor.Shah
10/09/2025, 5:13 PMdata_science/pipeline.py file:
I have base_data_science pipeline structure defined as -
base_data_science = Pipeline(
[
Node(
func=show_data,
inputs=["presence_motion_co2_combined_cleaned_neg_removed","parameters"],
outputs=None,
name="show_data_node"
),
Node(
func=split_data,
inputs=["presence_motion_co2_combined_cleaned_neg_removed","params:split_data"],
outputs=["X_train", "X_test", "y_train", "y_test"],
name="split_data_node"
)
]
)
After that, I am creating the demo_modelling_pipeline to execute just the LR mode training and evaluation, combining the previous two nodes (show and split) with the train and eval of LR.
def create_pipeline(**kwargs) -> Pipeline:
return Pipeline(
[
base_data_science,
Node(
func=train_model_LR,
inputs=["X_train", "y_train", "params:train_data"],
outputs="model_LR",
name="train_model_LR_node"
),
Node(
func=evaluate_model,
inputs=["model_LR", "X_test", "y_test", "params:eval_model"],
outputs="metrics_LR",
name="evaluate_model_LR_node"
)],
namespace="demo_modelling_pipeline",
prefix_datasets_with_namespace=False,
parameters={"params:train_data": "params:demo_train_data", "params:eval_model": "params:demo_eval_model"},
inputs={"presence_motion_co2_combined_cleaned_neg_removed"}
# inputs={"X_train", "y_train", "X_test", "y_test"}
)
Additionally, in the data_science_ext/pipeline.py file, I have another pipeline to execute the RF model training and evaluation after show and split.
def create_pipeline(**kwargs) -> Pipeline:
return Pipeline(
[
base_data_science,
Node(
func=train_model_RF,
inputs=["X_train", "y_train", "params:train_data"],
outputs="model_RF",
name="train_model_RF_node"
),
Node(
func=evaluate_model,
inputs=["model_RF", "X_test", "y_test", "params:eval_model"],
outputs="metrics_RF",
name="evaluate_model_RF_node"
)],
namespace="extended_modelling_pipeline",
prefix_datasets_with_namespace=False,
parameters={"params:train_data": "params:ext_train_data", "params:eval_model": "params:ext_eval_model"},
inputs={"presence_motion_co2_combined_cleaned_neg_removed"}
# inputs={"X_train", "y_train", "X_test", "y_test"}
)
Thus, both the pipelines have similar structures:
• show data,
• split data,
• train model (LR/RF),
• eval model (LR/RF).
Both of them start with the same presence_motion_co2_combined_cleaned_neg_removed table and the train-test split sets (X, y) are created internally. However, I am getting the following error while trying to run kedro viz or kedro registry list.
OutputNotUniqueError: Output(s) ['X_test', 'X_train', 'y_test', 'y_train'] are returned by more than one nodes. Node outputs must be unique.
I even tried creating separate unique nodes for these 4 outputs, each with the corresponding namespace prefixes. Yet, the error continues.Shah
10/09/2025, 5:24 PMShah
10/09/2025, 5:26 PMX_train, y_train, X_test, y_test nodes are not shown as outputs from the previous table or split_data_node, but somehow hanging in the air, which is strange, since they are created as outputs from the split_data_node.Shah
10/09/2025, 5:31 PM# inputs={"presence_motion_co2_combined_cleaned_neg_removed"}
inputs={"X_train", "y_train", "X_test", "y_test"}
the error is:
PipelineError: Inputs must not be outputs from another node in the same pipeline
which is equally strange, as the inputs must be outputs from another node in the same pipeline, as far as I understand.Shah
10/09/2025, 6:08 PMdata_science/pipeline.py contains:
namespace="demo_ds",
# prefix_datasets_with_namespace=False,
parameters={"params:split_data": "params:demo_split_data", "params:train_data": "params:demo_train_data", "params:eval_model": "params:demo_eval_model"},
inputs={"presence_motion_co2_combined_cleaned_neg_removed"}
# inputs={"X_train", "y_train", "X_test", "y_test"}
and data_science_ext/pipeline.py contains:
namespace="ext_ds",
# prefix_datasets_with_namespace=False,
parameters={"params:split_data": "params:ext_split_data", "params:train_data": "params:ext_train_data", "params:eval_model": "params:ext_eval_model"},
inputs={"presence_motion_co2_combined_cleaned_neg_removed"}
# inputs={"X_train", "y_train", "X_test", "y_test"}
and the overall pipeline is displayed as:Rashida Kanchwala
10/09/2025, 8:35 PMShah
10/10/2025, 8:48 AMprefix_datasets_with_namespace=False was working well. So it seemed logical that the same would continue since I was supplying a unique set of parameters. But this time the same inputs are shared by two different nodes. Perhaps that was the pain point.
Suggestion: the above two error messages were confusing and somewhat misguiding. In particular, the second message, because, normally the output of one node is the input to the next node, so that error seemed inappropriate. I may be wrong in my understanding though.Ankita Katiyar
10/10/2025, 10:18 AMprefix_datasets_with_namespaces was to disable the prefixing in situations when you’re using the namespaces as a “deployment unit”. When re-using a base pipeline with different parameters/inputs - it is recommended to keep the datasets also namespaced to avoid ambiguity (or provide explicit mapping in inputs and outputs )
I’ll take a closer look soon, the error messages do seem a bit crypticShah
10/10/2025, 4:41 PMShah
10/10/2025, 6:41 PMdemo_ds and ext_ds), after the training of individual models, come together for the evaluation method. I am not sure if that's even possible. If so, could you please elaborate, what's the best way to achieve it?
I tried creating the evaluate_model_node in both the pipelines, with the exact same input and output. I even changed the model_LR and model_RF to just model, to refer to the same file. Yet, the best I could get was the following (two separate nodes for each pipeline, although with the same name):