Hi Kedro Community We are trying to implement experiment tra Kedro #questions

Hi Kedro Community, We are trying to implement ex...

Lakshay Khurana

03/19/2024, 7:22 AM

Hi Kedro Community, We are trying to implement experiment tracking for our ml project with the help kedro-mlflow plugin. The model that we have is a bagged classifier model. When we run the training pipeline the model object is saved as mlflow.pyfunc.PyFuncModel and while doing the batch predictions for the new data we are getting the error "object of type PyFuncModel" has no len(). Is there a workaround for this ?

Juan Luis

03/19/2024, 7:31 AM

cc @Yolan Honoré-Rougé @Takieddine Kadiri batsignal

Yolan Honoré-Rougé

03/19/2024, 7:58 AM

A few questions : • Is your model a custom mlflow model or a standard one (e.g. a sklearn model)? Can you show the code? • Are you using the same versions of mlflow /kedro/sklearn... For training and inference? Can you tell which versions? • Are you saving the model with kedro_mlflow.io.models.MlflowModelTrackingDataset in your catalog? Can you show the entry?

Lakshay Khurana

03/19/2024, 8:03 AM

@Yolan Honoré-Rougé thanks for the response. 1. the model is bagged logistic regression model. 2. yes, using the same version. Building a POC using the official documentation and following the steps given in tutorial - https://github.com/Galileo-Galilei/kedro-mlflow-tutorial/blob/main/README.md 3. saving the model trained model as

Copy code

aggregated_linear_classifier_model_mlflow:
  type: pickle.PickleDataset
  filepath: data/06_models/mlflow_implementation/aggregated_linear_classifier_model_mlflow.pkl

and picking the model as pipeline_inference_model

Copy code

pipeline_inference_model:
  type: kedro_mlflow.io.models.MlflowModelTrackingDataset
  flavor: mlflow.pyfunc
  pyfunc_workflow: python_model
  artifact_path: mlflow_implementation
  run_id: f4d03b05236f4ae5927164a60ce86ed3

Yolan Honoré-Rougé

03/19/2024, 8:04 AM

Mlflow cannot read a pickle directly, it needs its own format. You need to save AND load the model with the MlflowTrackingDataset

👍 1

Yolan Honoré-Rougé

03/19/2024, 8:05 AM

But you can have 2 different entries in the catalog if needed

Yolan Honoré-Rougé

03/19/2024, 8:06 AM

Are you using pipeline_ml_factory?

Lakshay Khurana

03/19/2024, 8:06 AM

yes, using pipeline_ml_factory

Yolan Honoré-Rougé

03/19/2024, 8:10 AM

So it's a different issue

Yolan Honoré-Rougé

03/19/2024, 8:11 AM

Do you have other preprocessing /post-processing code you want to bundle with your model? Other artifacts (like encoders...)?

Yolan Honoré-Rougé

03/19/2024, 8:11 AM

Can you show me a screen capture of your mlflow model in the UI?

Lakshay Khurana

03/19/2024, 8:12 AM

yes, I do have preprocessing code to be bundled with the model

Lakshay Khurana

03/19/2024, 8:12 AM

👍 1

Yolan Honoré-Rougé

03/19/2024, 8:12 AM

If you do kedro run --pipeline=inference, does it work? This does not use mlflow so if this does not work, the pb lies in your kedro pipeline

Yolan Honoré-Rougé

03/19/2024, 8:15 AM

You can save the model as pickle for pipeline_ml_factory, it kedro_mlflow Wil store it as an artifact, not a model

Lakshay Khurana

03/19/2024, 8:20 AM

inference pipeline is working

Yolan Honoré-Rougé

03/19/2024, 8:21 AM

Hum very weird

Yolan Honoré-Rougé

03/19/2024, 8:21 AM

Sorry I don't have time right now

Yolan Honoré-Rougé

03/19/2024, 8:21 AM

Can you create a reproducible example you can share on github?

Yolan Honoré-Rougé

03/19/2024, 8:21 AM

I'll look at it tonight

Lakshay Khurana

03/19/2024, 8:22 AM

no worries, thanks for the responses. Sure, I will create one and share. Meanwhile will try to debug this more

Lakshay Khurana

03/19/2024, 8:22 AM

FYI, I am new to kedro and mlflow, so might be missing out something at my end. Will check again

👍 1

Yolan Honoré-Rougé

03/19/2024, 6:17 PM

Hi, do you have an update?

Lakshay Khurana

03/21/2024, 5:39 AM

@Yolan Honoré-Rougé apologies, I missed out on your message yesterday. I simplified my approach and made couple of changes in the pipeline. 1. Instead of using bagged logistic regression model, I updated the code to use stand alone sklearn logistic regression model. 2. Instead of calculating predicted prob, I classified the predictions using model.predict After these changes the pipeline ran without any error Post some googling, we realised that we need to write a custom model class for this. Now trying to work upon this. Does this seem feasible to you ?

Yolan Honoré-Rougé

03/21/2024, 6:18 PM

No problem. Actually pipeline_ml_favtory is supposed to work with any kedro pipeline (provide it has a single free input and a single output,which is a mlflow constraints) so I don't know what the problem is, but your initial workflow looks fine. What it does under the hood is precisely to create a custom model to create a mlflow model from kedro pipeline, and while it's of course possible to create your own, I don't think that will be easier than using the kedro-mlflow one.

Yolan Honoré-Rougé

03/21/2024, 6:19 PM

In case your still want to creta your own mlflow custom model, you can take inspiration from https://github.com/Galileo-Galilei/kedro-mlflow/blob/master/kedro_mlflow%2Fmlflow%2Fkedro_pipeline_model.py which what pipeline_ml_factory uses under the hood

3 Views

Open in Slack

Previous Next