question about kedro-mlflow, regarding the use of ...
# plugins-integrations
h
question about kedro-mlflow, regarding the use of
MlflowModelRegistryDataSet
in the Kedro-Mlflow integration for logging models to MLflow’s model registry, as documented in the Kedro-MLflow Python Objects section. Initially, I followed the documentation for using
MlflowModelLoggerDataSet
in the
catalog.yml
file, which I implemented successfully. However, I encountered confusion with
MlflowModelRegistryDataSet
. My initial attempt was based on the following configuration:
Copy code
my_transformer_model:
  type: kedro_mlflow.io.models.MlflowModelRegistryDataSet
  flavor: mlflow.transformers
  model_name: my_transformer_model_name
  stage_or_version: staging
When trying to save a model using
catalog.save("my_transformer_model", model)
, I received a
DatasetError
indicating that the ‘save’ method is not implemented for
MlflowModelRegistryDataSet
. The documentation provides parameters for this dataset but lacks a clear example for its correct usage in saving and registering a model to MLflow. Moving forward, I found a working solution for logging the transformer model in YAML API:
Copy code
my_transformer_model:
    type: kedro_mlflow.io.models.MlflowModelLoggerDataSet
    flavor: mlflow.transformers
    save_args:
        registered_model_name: "my_transformer_model_name"
This allowed me to save and load the model to MLflow successfully. This however is not documented as such. For model loading, I could indeed use the initial catalog entry for loading specific versions directly, Yet, I still have unresolved queries w.r.t Model Staging/Versioning*:* How to stage or version the model directly through the API, instead of using the MLflow UI. so using the MlflowModelLoggerDataSet to save, but also specify a version/stage. In addition i was wondering how to view associated metrics with the model training run in the mlflow model UI to efficiently promote the best model to staging. I can imagine that including practical examples in the official documentation, would significantly enhance the user experience.
y
Hello, I was indeed going to suggest to go with
MlflowModelLoggerDataSet
but I understand it is confusing. The rationale here is that model registry dataset aims at transitioning an existing model to the registry. Someone already mentioned that I should better document it, or even consider merging the 2 datasets.
Would you mind opening an issue in the kedro-mlflow repo?
h
yes, ill open an issue, thanks!