Hello guys I have set a dynamic modular pipeline that runs m Kedro #questions

Hello guys! I have set a dynamic modular pipeline ...

Camilo Piñón

08/21/2024, 8:34 AM

Hello guys! I have set a dynamic modular pipeline that runs model training for different models:

Copy code

def create_pipeline() -> Pipeline:
    """This function will create a complete modelling
    pipeline that consolidates a single shared 'split' stage,
    several modular instances of the 'train test evaluate' stage
    and returns a single, appropriately namespaced Kedro pipeline
    object.
    """

    pipes = []
    log.debug(
        f"settings.DYNAMIC_PIPELINES_MAPPING.items(): {settings.DYNAMIC_PIPELINES_MAPPING.items()}"
    )

    for namespace, variants in settings.DYNAMIC_PIPELINES_MAPPING.items():
        log.debug(f"namespace: {namespace}")
        log.debug(f"variants: {variants}")

        for variant in variants:
            pipes.append(
                pipeline(
                    pipe=new_train_template(),
                    inputs={
                        "input_table": "input_table",
                    },
                    parameters={
                        "model_options": f"{namespace}.{variant}.model_options",
                    },
                    namespace=f"{namespace}.{variant}",
                    tags=[variant, namespace],
                )
            )
    return sum(pipes)

The problem is that mlflow considers that all these models belong to the same mlflow run, so i get this error when I run all models with `kedro run --namespace train_evaluation`:

Copy code

[08/20/24 16:31:29] ERROR    Error during training: Changing param values is not allowed. Param with key='threshold' was already logged with value='{'threshold_q90':        nodes.py:125
                             0.5356575641792262, 'threshold_q95': 0.612233234673521, 'threshold_q99': 0.7159441011394121}' for run ID='d3bb8837e8dc4cf28389816f67841d53'.                
                             Attempted logging new value '{'threshold_q90': 0.5369235873602624, 'threshold_q95': 0.6154837134825515, 'threshold_q99': 0.6978202366080968}'.              
                                                                                                                                                                                         
                             The cause of this error is typically due to repeated calls                                                                                                  
                             to an individual run_id event logging.

Is it possible to change run name or even experiment dynamically for the different model training pipelines so that every model has its own run / experiment? Thank you in advance!

Dmitry Sorokin

08/21/2024, 9:20 AM

Hi Camilo, how are you using MLflow with Kedro? It seems like it might make sense to put

mlflow.start_run()

in the

before_pipeline_run()

hook in your case. What do you think?

❤️ 1

Camilo Piñón

08/21/2024, 10:05 AM

Yeah, maybe that is the better solution. I was debating between something like that (although I am not very familiar with hooks and will have to check it) and trying to make use of OmegaConfigLoader with some custom resolver for modifying the mlflow config (although I wasn't really sure on how to implement it or whether it was even feasible...). Thanks!!

👍 1

Dmitry Sorokin

08/21/2024, 10:43 AM

I believe hooks might be easier to work with. You can find more information about them here: https://docs.kedro.org/en/stable/hooks/introduction.html. If you run into any difficulties, feel free to reach out, we’ll try to assist you.

Camilo Piñón

08/21/2024, 10:47 AM

Already checking it 🙂 Thank you!

Camilo Piñón

08/21/2024, 11:36 AM

I think I am having now the same problem since I am returning

return sum(pipes)

it considers all model training as a unique pipeline, so the hook is only invoked once. Maybe using a before_node_run that applies in the specific node that performs the training step does the trick

Dmitry Sorokin

08/21/2024, 11:42 AM

Yes, you're right. You are essentially creating a single pipeline composed of nodes from different namespaces. So, you can try using

before_node_run()

or consider creating multiple pipelines.

👍 1

2 Views

Open in Slack

Previous Next