Hi! A question regarding `kedro-mlflow`. Is it pos...
# plugins-integrations
k
Hi! A question regarding
kedro-mlflow
. Is it possible to specify
artifact_uri
in the
mlflow.yml
configuration file? Context: I'm using
pipeline_ml_factory
, the MLFlow server that I'm pushing the artifacts to by default points to the incorrect GCP bucket. I would need to create an experiment with a different bucket.
n
No experience in the plugin itself, I believe there is an environment variable for it in Mlflow itself
k
There is MLFLOW_TRACKING_URI and MLFLOW_REGISTRY_URI, but no MLFLOW_ARTIFACT_URI unfortunately - https://mlflow.org/docs/latest/python_api/mlflow.environment_variables.html
y
Actually the artifact_uri is something which lives on server side, so mlflow does not really let you modify the one's in the server. Given this https://mlflow.org/docs/latest/tracking/artifacts-stores.html#setting-a-default-artifact-location-for-logging, it may be configurable for each experiment but we need to give it a try.
Can you try to programatically create an experiment in raw mlflow
Copy code
mlflow.set_tracking_uri(<your server>)
mlflow.log_artifacts("iris.csv") #should crash
mlflow.create_experiment("new_exp",
artifact_location=<your location>)
mlflow.set_experiment("new_exp")
mlflow.log_artifacts("iris.csv") #should log where you want
k
Haven't done these exact steps above ^, but changing the artifact-uri when creating the experiment by our CI/CD was possible using MLFlow API. I also managed to create the new experiment manually through the UI first with a different artifact-uri (that's the workaround that I used for now when using Kedro)
👍 1
Also, I looked through the code in
kedro-mlflow
and there only the https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_experiment is used, but the https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.create_experiment has the option to specify the artifact_location
If
create_experiment
could be used in the logic *there, and the artifact-uri propagated from the config, then it would work afaik
*there, means here
@Yolan Honoré-Rougé is it worth creating an issue about it? I can to do it if needed
y
Yes you can so we can keep track of this, but actually this is low priority for me because I am not sure it is worth the maintenance issues. My custom
_set_experiment
function used to be much more complex than this mostly because there are edge cases to deal with (deleted experiment which need to be restored and not created, setting the experiment globally if the user works interactively, create experminet which don't exist...) and I used to have a couple of issues with this function. Moreover,
kedro-mlflow
should work if you create the experiment once for all (with your CLI command) and then use it in the
mlflow.yml
. the only case you will need kedro-mlflow to handle it for you is if you create the experiment dynamically (maybe based on some
runtime_params
?) but this is likely very rare, is that what you are doing?
k
Yes, that's what I'm doing - creating a new mlflow experiment by running kedro (but then the artifact_uri is wrongly set by the mlflow server). The workaround is that I first created the experiment once for all with different artifact_uri and then ran kedro. Not using
runtime_params
yet to specify the experiment name. The problem with that approach however is that this is a tribal knowledge / information that needs to be put somewhere in the docs. When someone changes the experiment name or creates a new kedro project, then the process of creating the experiment must be repeated manually (before doing
kedro run
, because otherwise the experiment will be locked to the wrong uri). That's an inconvenience. I like things happening automatically on proper configuration 😅 The other partial solution would be ofc to change the default artifact_uri in the mlflow server to the proper bucket. But the convention here is to save artifacts to
gs:/<bucket_name>/<model_name>
, and that we would need to specify ourselves, as mlflow server points only to one uri. I will create the issue soon
👍 1
y
Yes I definitely understand the value of doing it automatically. Do not hesitate to open a PR :)
🙌 1
k
Hey Yolan! Thanks for the discussion 🙌 I created the issue here - https://github.com/Galileo-Galilei/kedro-mlflow/issues/557
👍 1