With MLflow you have to create a custom `PythonModel` in cas Kedro #questions

With MLflow, you have to create a custom `PythonMo...

Matthias Roels

04/29/2025, 4:26 PM

With MLflow, you have to create a custom

PythonModel

in case you want to store a model combined with its preprocessing steps (which you always have to do imo). How can you do that with kedro (or kedro-mlflow)? The problem is that you probably fitted preprocessors in earlier nodes and persisted the result. As far as I can tell from the docs, MLflow requires its artifacts in a custom models to be persisted on disk (which you can do with the catalog) but these path strings are not readily available in the kedro nodes to be passed to the constructor of pyfunc… Any tips, ideas welcome 😀

🤩 1

Yolan Honoré-Rougé

04/29/2025, 5:16 PM

Oh my favorite question. It pops up from time to time but it's very difficult to search for.

Yolan Honoré-Rougé

04/29/2025, 5:17 PM

Fun fact: kedro-mlflow was built exactly to solve this problem, before experiment tracking

Yolan Honoré-Rougé

04/29/2025, 5:17 PM

So you have a

KedroPipelineModel

class in kedro mlflow which enable to create a custom model from any kedro pipeline

❤️ 1

Yolan Honoré-Rougé

04/29/2025, 5:19 PM

But the recommended way is to use the pipeline_ml_factory function to create a Pipeline ml object. It behaves like a standars kedro pipeline, but kedro-mlflow hook will recognize it and automatically log the entire pipeline as a custom model at the end of training

Yolan Honoré-Rougé

04/29/2025, 5:20 PM

You have a very detailed tutorial here : https://github.com/Galileo-Galilei/kedro-mlflow-tutorial

Yolan Honoré-Rougé

04/29/2025, 5:21 PM

(just read the readme, it should be quite self explanatory - basically the only thing to do is to convert you training pipeline with pipeline_ml_factory in the pipeline_registry.py)

Yolan Honoré-Rougé

04/29/2025, 5:23 PM

And you have detailed doc here :https://kedro-mlflow.readthedocs.io/en/latest/source/04_pipeline_as_model/01_pipeline_as_custom_model/01_mlflow_models.html

Yolan Honoré-Rougé

04/29/2025, 5:23 PM

I'd be happy to get feedback on this

Yolan Honoré-Rougé

05/05/2025, 6:53 PM

Hi @Matthias Roels did you have any chance to try this?

Matthias Roels

05/06/2025, 7:49 PM

Yes and no, I did some experimental testing and did a deep dive in the code base. Overall the plugin is really great! One point of immediate improvement that I see; use

mlflow-skinny

instead of

mlflow

as a dependency (and potentially declare

mlflow

as an optional dependency)

Yolan Honoré-Rougé

05/07/2025, 3:03 PM

Thanks for the feedback, glad to have more in depth thoughts if you try to experiment more. Unfortunately I've been asked a lot to expose mlflow-skinny instead of mlflow, but it breaks some functionalities (local UI, model registry) and I am a bit reluctant to remove them by default. I did not find a good way to have a op tin functionality because mlflow does not expose it as optional requirements but as a different package, which create namespace conflicts in python.

Matthias Roels

05/07/2025, 6:24 PM

You are right, that’s a tricky point…

Matthias Roels

05/07/2025, 6:26 PM

Today I was playing around with it a bit more and stumbled upon an issue I couldn’t resolve. When you train an xgboost model, you ideally want to log it in

ubj

format as that format is guaranteed to be compatible across different xgboost versions (which is useful for later reuse). However, there is no kedro dataset to store the model in such a way…

2 Views

Open in Slack

Previous Next