https://kedro.org/ logo
#questions
Title
# questions
l

Lakshay Khurana

03/19/2024, 7:22 AM
Hi Kedro Community, We are trying to implement experiment tracking for our ml project with the help kedro-mlflow plugin. The model that we have is a bagged classifier model. When we run the training pipeline the model object is saved as mlflow.pyfunc.PyFuncModel and while doing the batch predictions for the new data we are getting the error "object of type PyFuncModel" has no len(). Is there a workaround for this ?
j

Juan Luis

03/19/2024, 7:31 AM
cc @Yolan Honoré-Rougé @Takieddine Kadiri batsignal
y

Yolan Honoré-Rougé

03/19/2024, 7:58 AM
A few questions : • Is your model a custom mlflow model or a standard one (e.g. a sklearn model)? Can you show the code? • Are you using the same versions of mlflow /kedro/sklearn... For training and inference? Can you tell which versions? • Are you saving the model with kedro_mlflow.io.models.MlflowModelTrackingDataset in your catalog? Can you show the entry?
l

Lakshay Khurana

03/19/2024, 8:03 AM
@Yolan Honoré-Rougé thanks for the response. 1. the model is bagged logistic regression model. 2. yes, using the same version. Building a POC using the official documentation and following the steps given in tutorial - https://github.com/Galileo-Galilei/kedro-mlflow-tutorial/blob/main/README.md 3. saving the model trained model as
Copy code
aggregated_linear_classifier_model_mlflow:
  type: pickle.PickleDataset
  filepath: data/06_models/mlflow_implementation/aggregated_linear_classifier_model_mlflow.pkl
and picking the model as pipeline_inference_model
Copy code
pipeline_inference_model:
  type: kedro_mlflow.io.models.MlflowModelTrackingDataset
  flavor: mlflow.pyfunc
  pyfunc_workflow: python_model
  artifact_path: mlflow_implementation
  run_id: f4d03b05236f4ae5927164a60ce86ed3
y

Yolan Honoré-Rougé

03/19/2024, 8:04 AM
Mlflow cannot read a pickle directly, it needs its own format. You need to save AND load the model with the MlflowTrackingDataset
👍 1
But you can have 2 different entries in the catalog if needed
Are you using pipeline_ml_factory?
l

Lakshay Khurana

03/19/2024, 8:06 AM
yes, using pipeline_ml_factory
y

Yolan Honoré-Rougé

03/19/2024, 8:10 AM
So it's a different issue
Do you have other preprocessing /post-processing code you want to bundle with your model? Other artifacts (like encoders...)?
Can you show me a screen capture of your mlflow model in the UI?
l

Lakshay Khurana

03/19/2024, 8:12 AM
yes, I do have preprocessing code to be bundled with the model
image.png
👍 1
y

Yolan Honoré-Rougé

03/19/2024, 8:12 AM
If you do kedro run --pipeline=inference, does it work? This does not use mlflow so if this does not work, the pb lies in your kedro pipeline
You can save the model as pickle for pipeline_ml_factory, it kedro_mlflow Wil store it as an artifact, not a model
l

Lakshay Khurana

03/19/2024, 8:20 AM
inference pipeline is working
y

Yolan Honoré-Rougé

03/19/2024, 8:21 AM
Hum very weird
Sorry I don't have time right now
Can you create a reproducible example you can share on github?
I'll look at it tonight
l

Lakshay Khurana

03/19/2024, 8:22 AM
no worries, thanks for the responses. Sure, I will create one and share. Meanwhile will try to debug this more
FYI, I am new to kedro and mlflow, so might be missing out something at my end. Will check again
👍 1
y

Yolan Honoré-Rougé

03/19/2024, 6:17 PM
Hi, do you have an update?
l

Lakshay Khurana

03/21/2024, 5:39 AM
@Yolan Honoré-Rougé apologies, I missed out on your message yesterday. I simplified my approach and made couple of changes in the pipeline. 1. Instead of using bagged logistic regression model, I updated the code to use stand alone sklearn logistic regression model. 2. Instead of calculating predicted prob, I classified the predictions using model.predict After these changes the pipeline ran without any error Post some googling, we realised that we need to write a custom model class for this. Now trying to work upon this. Does this seem feasible to you ?
y

Yolan Honoré-Rougé

03/21/2024, 6:18 PM
No problem. Actually pipeline_ml_favtory is supposed to work with any kedro pipeline (provide it has a single free input and a single output,which is a mlflow constraints) so I don't know what the problem is, but your initial workflow looks fine. What it does under the hood is precisely to create a custom model to create a mlflow model from kedro pipeline, and while it's of course possible to create your own, I don't think that will be easier than using the kedro-mlflow one.
In case your still want to creta your own mlflow custom model, you can take inspiration from https://github.com/Galileo-Galilei/kedro-mlflow/blob/master/kedro_mlflow%2Fmlflow%2Fkedro_pipeline_model.py which what pipeline_ml_factory uses under the hood