Kacper Leśniara
05/16/2024, 1:20 PMkedro-mlflow
. Is it possible to specify artifact_uri
in the mlflow.yml
configuration file?
Context:
I'm using pipeline_ml_factory
, the MLFlow server that I'm pushing the artifacts to by default points to the incorrect GCP bucket. I would need to create an experiment with a different bucket.Nok Lam Chan
05/16/2024, 2:01 PMKacper Leśniara
05/16/2024, 2:02 PMYolan Honoré-Rougé
05/17/2024, 6:56 AMYolan Honoré-Rougé
05/17/2024, 7:03 AMmlflow.set_tracking_uri(<your server>)
mlflow.log_artifacts("iris.csv") #should crash
mlflow.create_experiment("new_exp",
artifact_location=<your location>)
mlflow.set_experiment("new_exp")
mlflow.log_artifacts("iris.csv") #should log where you want
Kacper Leśniara
05/17/2024, 9:57 AMKacper Leśniara
05/17/2024, 9:59 AMkedro-mlflow
and there only the https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_experiment is used, but the https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment has the option to specify the artifact_locationKacper Leśniara
05/17/2024, 10:00 AMcreate_experiment
could be used in the logic *there, and the artifact-uri propagated from the config, then it would work afaikKacper Leśniara
05/17/2024, 10:04 AMKacper Leśniara
05/20/2024, 8:33 AMYolan Honoré-Rougé
05/20/2024, 8:08 PM_set_experiment
function used to be much more complex than this mostly because there are edge cases to deal with (deleted experiment which need to be restored and not created, setting the experiment globally if the user works interactively, create experminet which don't exist...) and I used to have a couple of issues with this function.
Moreover, kedro-mlflow
should work if you create the experiment once for all (with your CLI command) and then use it in the mlflow.yml
. the only case you will need kedro-mlflow to handle it for you is if you create the experiment dynamically (maybe based on some runtime_params
?) but this is likely very rare, is that what you are doing?Kacper Leśniara
05/21/2024, 9:40 AMruntime_params
yet to specify the experiment name.
The problem with that approach however is that this is a tribal knowledge / information that needs to be put somewhere in the docs. When someone changes the experiment name or creates a new kedro project, then the process of creating the experiment must be repeated manually (before doing kedro run
, because otherwise the experiment will be locked to the wrong uri). That's an inconvenience. I like things happening automatically on proper configuration 😅
The other partial solution would be ofc to change the default artifact_uri in the mlflow server to the proper bucket. But the convention here is to save artifacts to gs:/<bucket_name>/<model_name>
, and that we would need to specify ourselves, as mlflow server points only to one uri.
I will create the issue soonYolan Honoré-Rougé
05/22/2024, 6:52 AMKacper Leśniara
05/28/2024, 9:03 AM