Following up on my previous question, talking with...
# questions
t
Following up on my previous question, talking with Rashida I found that the problem is a little bit different, so I'm reposting: I’m failing to load the champion version of a wrapper SklearnPipeline model registered in MLFlow. I want to save many experiments to MLFlow and to be able to load the champion version for other downstream pipelines. My catalog.yml looks like this:
Copy code
model:
 type: kedro_mlflow.io.models.MlflowModelTrackingDataset
 flavor: mlflow.sklearn
 save_args:
  registered_model_name:model

model_loader:
 type: kedro_mlflow.io.models.MlflowModelRegistryDataset
 flavor: mlflow.sklearn
 model_name: "model"
 alias: "champion"
If I try to load the model in a new kedro session, it will demand a run_id. If I try to use the model_loader. It will complain that the model (the wrapper SklearnPipeline object) don’t have a metadata attribute, giving this error message:
Copy code
│ /opt/anaconda3/envs/topazDS_2/lib/python3.11/site-packages/kedro_mlflow/io/models/mlflow_model_r │
│ egistry_dataset.py:98 in _load                                                                   │
│                                                                                                  │
│    95 │   │   # because the same run can be registered under several different names             │
│    96 │   │   #  in the registry. See <https://github.com/Galileo-Galilei/kedro-mlflow/issues/5>   │
│    97 │   │   import pdb; pdb.set_trace()                                                        │
│ ❱  98 │   │   <http://self._logger.info|self._logger.info>(f"Loading model from run_id='{model.metadata.run_id}'")          │
│    99 │   │   return model                                                                       │
│   100 │                                                                                          │
│   101 │   def _save(self, model: Any) -> None:                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'SklearnPipeline' object has no attribute 'metadata'

DatasetError: Failed while loading data from dataset 
kedro_mlflow.io.models.mlflow_model_registry_dataset.MlflowModelRegistryDataset(model_uri='models:/mill1_west_no_we
nco_st_model@champion', model_name='mill1_west_no_wenco_st_model', alias='champion', flavor='mlflow.sklearn', 
pyfunc_workflow='python_model').
'SklearnPipeline' object has no attribute 'metadata'
I think that the MlflowModelRegistryDataset class wasn't expecting the model to be a sklearn object. Probably there's a difference in how I'm saving the model (MlflowModelTrackingDataset) and how I'm loading it (MlflowModelRegistryDataset). How I could load the champion model? @Rashida Kanchwala @Ravi Kumar Pilla
👀 1
r
As discussed, I will look at this and get back to you. In the meantime, @Yolan Honoré-Rougé if you have any suggestions, please let us know. Thank you
Hi @Thiago Valejo, The whole issue comes from the model being saved not having the metadata field. If I comment out the logger present in
MlflowRegistryDataset
it works fine. This was introduced in 0.13.3 release. I am not sure if there is a schema for the model which is saved using MlflowModelTrackingDataset
Copy code
<http://self._logger.info|self._logger.info>(f"Loading model from run_id='{model.metadata.run_id}'")
👍 1
@Yolan Honoré-Rougé thank you for responding. Is there a schema expeced from the model ? I think @Thiago Valejo is using a custom model
y
Can you open an issue on GitHub to keep track? I'll publish a fix in 2 weeks, but if someone can open a PR I can review and release. The simplest short term solution is to comment out logging, but I'll dig deeper when I have time
r
okay, I can open an issue and also a short term fix PR conditionally logging if there is no schema constraint for the model
👍 1
Hi @Thiago Valejo, which python version are you using ?
t
3.11
r
from the logs looks like py311, can you install 0.13.2
kedro-mlflow
as a short term workaround and try testing ?
For tracking - PR - https://github.com/Galileo-Galilei/kedro-mlflow/pull/671 Issue - https://github.com/Galileo-Galilei/kedro-mlflow/issues/670 @Yolan Honoré-Rougé whenever you have time. Thank you
🥳 1
y
Thank you very much for the PR @Ravi Kumar Pilla . Can you upgrade to kedro 1.0 @Thiago Valejo? The fix will only be available for kedro-mlflow>=1.0.0 which is only compatible with kedro>=1.0 unfortunately :/
extreme teamwork 1
r
He is on Kedro 1.0 when we spoke today
👍 1
y
The fix is on pypi, you can upgrade: https://pypi.org/project/kedro-mlflow/
🥳 1
👍 1
r
Thanks for the quick turnaround @Yolan Honoré-Rougé @Thiago Valejo please let us know if you have further issues. Thank you
t
Working as a charm. Thanks a lot @Ravi Kumar Pilla @Rashida Kanchwala and @Yolan Honoré-Rougé
🥳 2