Camilo Piñón
08/21/2024, 8:34 AMdef create_pipeline() -> Pipeline:
"""This function will create a complete modelling
pipeline that consolidates a single shared 'split' stage,
several modular instances of the 'train test evaluate' stage
and returns a single, appropriately namespaced Kedro pipeline
object.
"""
pipes = []
log.debug(
f"settings.DYNAMIC_PIPELINES_MAPPING.items(): {settings.DYNAMIC_PIPELINES_MAPPING.items()}"
)
for namespace, variants in settings.DYNAMIC_PIPELINES_MAPPING.items():
log.debug(f"namespace: {namespace}")
log.debug(f"variants: {variants}")
for variant in variants:
pipes.append(
pipeline(
pipe=new_train_template(),
inputs={
"input_table": "input_table",
},
parameters={
"model_options": f"{namespace}.{variant}.model_options",
},
namespace=f"{namespace}.{variant}",
tags=[variant, namespace],
)
)
return sum(pipes)
The problem is that mlflow considers that all these models belong to the same mlflow run, so i get this error when I run all models with `kedro run --namespace train_evaluation`:
[08/20/24 16:31:29] ERROR Error during training: Changing param values is not allowed. Param with key='threshold' was already logged with value='{'threshold_q90': nodes.py:125
0.5356575641792262, 'threshold_q95': 0.612233234673521, 'threshold_q99': 0.7159441011394121}' for run ID='d3bb8837e8dc4cf28389816f67841d53'.
Attempted logging new value '{'threshold_q90': 0.5369235873602624, 'threshold_q95': 0.6154837134825515, 'threshold_q99': 0.6978202366080968}'.
The cause of this error is typically due to repeated calls
to an individual run_id event logging.
Is it possible to change run name or even experiment dynamically for the different model training pipelines so that every model has its own run / experiment?
Thank you in advance!Dmitry Sorokin
08/21/2024, 9:20 AMmlflow.start_run()
in the before_pipeline_run()
hook in your case. What do you think?Camilo Piñón
08/21/2024, 10:05 AMDmitry Sorokin
08/21/2024, 10:43 AMCamilo Piñón
08/21/2024, 10:47 AMCamilo Piñón
08/21/2024, 11:36 AMreturn sum(pipes)
it considers all model training as a unique pipeline, so the hook is only invoked once. Maybe using a before_node_run that applies in the specific node that performs the training step does the trickDmitry Sorokin
08/21/2024, 11:42 AMbefore_node_run()
or consider creating multiple pipelines.