Hi everyone I have a question about integrating MLflow into Kedro #questions

Hi everyone, I have a question about integrating ...

Sudip Bhandari

04/22/2025, 2:52 PM

Hi everyone, I have a question about integrating MLflow into my Kedro project. Currently, all outputs from my Kedro project are being stored in a designated folder within the project directory (e.g.,

mykedroproject/

), as specified in my

catalog.yml

. However, I've noticed that when I implement MLflow, artifacts and metrics are logged in a different location (under the

mlruns

directory). This results in the same outputs being stored twice: once through Kedro and again via MLflow. Do you have any advice on how to address this issue so that I store results only once? Ideally, I would like to have specific artifacts displayed in the MLflow UI, sourced directly from the

mykedroproject/

folder. Thanks in advance!!

👍 1

Jitendra Gundaniya

04/22/2025, 3:57 PM

Hi Sid, Have you tried

kedro-mlflow

plugin? You can follow the official guide here: Kedro-MLflow Plugin Guide. This guide explains how to configure and use the plugin effectively. If you need advanced customisation, refer to the last chapter of the guide, which details how to use hooks.

👍 1

Yolan Honoré-Rougé

04/22/2025, 6:13 PM

I am not sure about the question : mlflow duplicates your data / parameters / metrics by design, so you keep track of the entire history, while kedro only keeps track of the last version written during

kedro run

. If you have kedro versioning enable you can turn it off, but if you are using dataset without versioning, this is the intended behaviour.

👍 1

Yolan Honoré-Rougé

04/22/2025, 6:14 PM

Usually people use a server and a S3 backend to store data because storing each run locally can be storage expensive

Sudip Bhandari

04/22/2025, 8:02 PM

Thank you, Jitendra and Yolan! Very helpful insights. I’ve successfully configured my setup to store artifacts, such as

regressor.pickle

, exclusively in the

mlruns

directory. However, I am facing challenges with the reverse process: I want MLflow to retrieve artifacts directly from my Kedro project directory, ensuring that my project structure remains intact (without having the same artifact be duplicated on

mlruns

) Specifically, I want to maintain my organized subfolders within

mykedroproject

(e.g.,

raw

features

, etc.) that adhere to the Kedro layer nomenclature. This arrangement makes debugging more straightforward, as I can avoid using run IDs and UUIDs assigned by MLflow. And yes, I am currently versioning my Kedro run artifacts. Any thoughts/advice on this?

Laurens Vijnck

04/23/2025, 8:58 AM

Would be nice to have input on this one! We've also tried to do this, but it seems it's not possible to organise them as you like, since MLFlow forces runs to be housed in a directory for the experiment

💯 1

6 Views

Open in Slack

Previous Next