Hello everyone I m trying to link training and inference pip Kedro #questions

Hello everyone! I'm trying to link training and i...

Олег Литвинов

12/04/2024, 6:51 PM

Hello everyone! I'm trying to link training and inference pipelines. And faced an interesting problem that it looks like you are not allowed to use "parameters" dictionary for the ml pipeline like in the node below:

Copy code

node(
                func=train_model,
                inputs=["df_train", "y_train", "df_val", "y_val", "parameters"],
                outputs="trained_recommender",
                name="train_model_node",
                tags=["training"],
            ),

Below is the link to the source code which accepts only dataset_name.startswith("params:") https://github.com/Galileo-Galilei/kedro-mlflow/blob/master/kedro_mlflow/mlflow/kedro_pipeline_model.py#L122 Do I understand correctly that I have to define all the parameters I supposed to use manually? Sounds surprising to see the error about kedro's default parameters 🙂

Copy code

KedroPipelineModelError: 
                                The datasets of the training pipeline must be persisted locally
                                to be used by the inference pipeline. You must enforce them as
                                non 'MemoryDataset' in the 'catalog.yml'.
                                Dataset 'parameters' is not persisted currently.

Hall

12/04/2024, 6:51 PM

Someone will reply to you shortly. In the meantime, this might help:

Yolan Honoré-Rougé

12/04/2024, 9:30 PM

TBH I purposely didn't support parameters entry as I consider it a very dangerous practice and opposed to what kedro-mlflow tries to do (logging unused parameters is extremely confusing), but it is very weird I am not compatible with kedro. I should rather try to deprecate parameters in the core framework than adding inconsistency in the plugin. I can accept PR for that, but I honestly strongly discouraged it. If you have a lot of parameters used in your nodes, you can just pass them as a dictionary:

Copy code

# Parameters.yml
model_config:
    Param1: value1
    Param2: value2
    Subdict1: 
        Subparam:value3

And in your pipeline.py:

Copy code

node(
                func=train_model,
                inputs=["df_train", "y_train", "df_val", "y_val", "params:model_config"], outputs="trained_recommender",
                name="train_model_node",
                tags=["training"],
            ),

This should work and is more readable and reproducible. @Juan Luis To follow another discussion, this is one of the thing I'd like to clarify and eventually break in 0.20/ 1.0 😅

❤️ 1

Олег Литвинов

12/04/2024, 9:34 PM

Dear @Yolan Honoré-Rougé, thank you very much for the answer! But registering some of the parameters is essential, right? We want to see in the mlflow registry what exact params were used for training and if they are used for the inference (I know MLflow is not very friendly with their change but regardless) without any changes. As for the example, also was thinking about adding them in the keyword base. Thank you!

Yolan Honoré-Rougé

12/05/2024, 7:56 AM

Yes you must need to register the exact params you need for inference (for training, it's just a matter of reproducibility : you want to remember the parameters you used but it's not strictly speaking mandatory - kedro-mlflow will log any input anyway)

❤️ 1

Yolan Honoré-Rougé

12/05/2024, 7:58 AM

And it will become even more prevalent after https://github.com/Galileo-Galilei/kedro-mlflow/pull/612 is merged (I'd bet before Christmas). You will need the exact parameters for inference to be registered in mlflow signature to be able to modify them at predict time

K 1

2 Views

Open in Slack

Previous Next