Hi A question regarding `kedro mlflow` Is it possible to spe Kedro #plugins-integrations

Hi! A question regarding `kedro-mlflow`. Is it pos...

Kacper Leśniara

05/16/2024, 1:20 PM

Hi! A question regarding

kedro-mlflow

. Is it possible to specify

artifact_uri

in the

mlflow.yml

configuration file? Context: I'm using

pipeline_ml_factory

, the MLFlow server that I'm pushing the artifacts to by default points to the incorrect GCP bucket. I would need to create an experiment with a different bucket.

Nok Lam Chan

05/16/2024, 2:01 PM

No experience in the plugin itself, I believe there is an environment variable for it in Mlflow itself

Kacper Leśniara

05/16/2024, 2:02 PM

There is MLFLOW_TRACKING_URI and MLFLOW_REGISTRY_URI, but no MLFLOW_ARTIFACT_URI unfortunately - https://mlflow.org/docs/latest/python_api/mlflow.environment_variables.html

Yolan Honoré-Rougé

05/17/2024, 6:56 AM

Actually the artifact_uri is something which lives on server side, so mlflow does not really let you modify the one's in the server. Given this https://mlflow.org/docs/latest/tracking/artifacts-stores.html#setting-a-default-artifact-location-for-logging, it may be configurable for each experiment but we need to give it a try.

Yolan Honoré-Rougé

05/17/2024, 7:03 AM

Can you try to programatically create an experiment in raw mlflow

Copy code

mlflow.set_tracking_uri(<your server>)
mlflow.log_artifacts("iris.csv") #should crash
mlflow.create_experiment("new_exp",
artifact_location=<your location>)
mlflow.set_experiment("new_exp")
mlflow.log_artifacts("iris.csv") #should log where you want

Kacper Leśniara

05/17/2024, 9:57 AM

Haven't done these exact steps above ^, but changing the artifact-uri when creating the experiment by our CI/CD was possible using MLFlow API. I also managed to create the new experiment manually through the UI first with a different artifact-uri (that's the workaround that I used for now when using Kedro)

👍 1

Kacper Leśniara

05/17/2024, 9:59 AM

Also, I looked through the code in

kedro-mlflow

and there only the https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_experiment is used, but the https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment has the option to specify the artifact_location

Kacper Leśniara

05/17/2024, 10:00 AM

create_experiment

could be used in the logic *there, and the artifact-uri propagated from the config, then it would work afaik

Kacper Leśniara

05/17/2024, 10:04 AM

*there, means here

Kacper Leśniara

05/20/2024, 8:33 AM

@Yolan Honoré-Rougé is it worth creating an issue about it? I can to do it if needed

Yolan Honoré-Rougé

05/20/2024, 8:08 PM

Yes you can so we can keep track of this, but actually this is low priority for me because I am not sure it is worth the maintenance issues. My custom

_set_experiment

function used to be much more complex than this mostly because there are edge cases to deal with (deleted experiment which need to be restored and not created, setting the experiment globally if the user works interactively, create experminet which don't exist...) and I used to have a couple of issues with this function. Moreover,

kedro-mlflow

should work if you create the experiment once for all (with your CLI command) and then use it in the

mlflow.yml

. the only case you will need kedro-mlflow to handle it for you is if you create the experiment dynamically (maybe based on some

runtime_params

?) but this is likely very rare, is that what you are doing?

Kacper Leśniara

05/21/2024, 9:40 AM

Yes, that's what I'm doing - creating a new mlflow experiment by running kedro (but then the artifact_uri is wrongly set by the mlflow server). The workaround is that I first created the experiment once for all with different artifact_uri and then ran kedro. Not using

runtime_params

yet to specify the experiment name. The problem with that approach however is that this is a tribal knowledge / information that needs to be put somewhere in the docs. When someone changes the experiment name or creates a new kedro project, then the process of creating the experiment must be repeated manually (before doing

kedro run

, because otherwise the experiment will be locked to the wrong uri). That's an inconvenience. I like things happening automatically on proper configuration 😅 The other partial solution would be ofc to change the default artifact_uri in the mlflow server to the proper bucket. But the convention here is to save artifacts to

gs:/<bucket_name>/<model_name>

, and that we would need to specify ourselves, as mlflow server points only to one uri. I will create the issue soon

👍 1

Yolan Honoré-Rougé

05/22/2024, 6:52 AM

Yes I definitely understand the value of doing it automatically. Do not hesitate to open a PR :)

🙌 1

Kacper Leśniara

05/28/2024, 9:03 AM

Hey Yolan! Thanks for the discussion 🙌 I created the issue here - https://github.com/Galileo-Galilei/kedro-mlflow/issues/557

👍 1

Open in Slack

Previous Next