I d like to log figures in kedro mlflow through the mlflow l Kedro #plugins-integrations

I’d like to log figures in kedro-mlflow through th...

Hugo Evers

05/31/2024, 2:13 PM

I’d like to log figures in kedro-mlflow through the mlflow log_figure method, i saw it was mentioned in a discussion from quite some time ago, however never implemented. I gather i can do something like:

Copy code

dependancy_figure:
    type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
    dataset:
        type: kedro_datasets.plotly.JSONDataset
        filepath: ....

However when testing whether this would give the same result as

mlflow.log_figure()

i got issues with kedro assuming the dataset is versioned while the s3 bucket is not. Anyway, I’d propose to make a dataset to do this. Would the authors be open to a PR? And if so, do you have opinions on the implementation/naming, should i subclass the MlflowArtifactDataset, or another class?

Nok Lam Chan

05/31/2024, 2:16 PM

However when testing whether this would give the same result as
mlflow.log_figure()
i got issues with kedro assuming the dataset is versioned while the s3 bucket is not.

Can you show the error?

this 1

Hugo Evers

05/31/2024, 2:24 PM

catalog.yml:

Copy code

shap_dependency_figure:
    type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
    dataset:
        type: kedro_datasets.plotly.JSONDataset
        filepath: <s3://aw-science/performance_optimisation/${oc.env:ENV}/${oc.env:CLIENT_NAME}/data/08_reporting/shap_dependency.json>

code:

Copy code

import plotly.express as px

fig = px.bar(x=["a", "b", "c"], y=[1, 3, 2])

catalog.save("shap_dependency_figure",fig)

error:

Copy code

DatasetError: Cannot save versioned dataset 'shap_dependency.json' to 
'aw-science/performance_optimisation/dev/Reed/data/08_reporting' because a file with the same name already exists 
in the directory. This is likely because versioning was enabled on a dataset already saved previously. Either 
remove 'shap_dependency.json' from the directory or manually convert it into a versioned dataset by placing it in a
versioned directory (e.g. with default versioning format 
'aw-science/performance_optimisation/dev/Reed/data/08_reporting/shap_dependency.json/YYYY-MM-DDThh.mm.ss.sssZ/shap_
dependency.json').

Hugo Evers

05/31/2024, 2:25 PM

i cant paste the entire traceback, but in the middle the error indicates that the local file specified for logging as an artifact does not exist. The issue arises during the attempt to upload the file to S3 via MLflow.

Hugo Evers

05/31/2024, 2:27 PM

so funny enough, if i do this:

Copy code

shap_dependency_figure:
    type: kedro_datasets.plotly.JSONDataset
    filepath: <s3://aw-science/performance_optimisation/${oc.env:ENV}/${oc.env:CLIENT_NAME}/data/08_reporting/shap_dependency.json>

it saves just fine

Hugo Evers

05/31/2024, 2:27 PM

and of course when i do this:

Copy code

mlflow.log_figure(fig,"shap_dependency_figure.html")

also, it logs just fine, and i can view the plot in mlflow

Juan Luis

05/31/2024, 2:28 PM

oh this error is typical when you've used a versioned dataset a minute ago, then you modify the config, then it complains

Juan Luis

05/31/2024, 2:28 PM

if you can, try doing

rm -r aw-science/performance_optimisation/dev/Reed/data/08_reporting

(or make a backup somewhere else)

Hugo Evers

05/31/2024, 2:29 PM

hmm, but none of those files are versioned, there are versioned files in the bucket though

Hugo Evers

05/31/2024, 2:29 PM

but not that folder

Hugo Evers

05/31/2024, 2:30 PM

but so, would it be interesting to create an mlflow dataset that performs

log_figure

? Or does the mlflow artefact dataset do the same thing?

Juan Luis

05/31/2024, 2:39 PM

but not that folder

hmm okay, I see it now: https://kedro-mlflow.readthedocs.io/en/stable/source/04_experimentation_tracking/03_version_datasets.html#how-to-version-data-in-a-kedro-project

Juan Luis

05/31/2024, 2:39 PM

# must be a local file, wherever you want to log the data in the end

Hugo Evers

05/31/2024, 2:39 PM

ahh okay, i just wrote the plot to a local directory:

Copy code

shap_dependency_figure:
    type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
    dataset:
        type: kedro_datasets.plotly.JSONDataset
        filepath: data/08_reporting/shap_dependency.html

The thing is, regardless of the file handle, (html, json, etc) in mlflow you will see json and not a figure. Which makes sense because the underlying dataset is saving json

Juan Luis

05/31/2024, 2:39 PM

so using

filepath: s3

with

MlflowArtifactDataset

isn't supported

Juan Luis

05/31/2024, 2:40 PM

The thing is, regardless of the file handle, (html, json, etc) in mlflow you will see json and not a figure.

Which makes sense because the underlying dataset is saving json

yup I see it now. pinging @Yolan Honoré-Rougé but maybe it's better that you continue in that discussion you pointed out https://github.com/Galileo-Galilei/kedro-mlflow/discussions/338

Juan Luis

05/31/2024, 2:40 PM

(or open a new discussion/issue)

Hugo Evers

05/31/2024, 2:41 PM

But what i want is the same outcome as mlflow.log_figure, so either that means adding a plotly dataset that saves figures to html, and then using the

<http://kedro_mlflow.io|kedro_mlflow.io>.artifacts.MlflowArtifactDataset

or adding a MlflowFigure dataset

Hugo Evers

05/31/2024, 2:42 PM

my personal preference would be the latter because then i dont have to deal with the save locations in production

Hugo Evers

05/31/2024, 2:42 PM

(because we render huge dashboards in html and then save those, so saving locally could cause issues, saving directly to s3 would not)

Yolan Honoré-Rougé

06/04/2024, 8:56 PM

Sorry I forgot to answer to this thread. I indeed saw little value to add a dataset around

log_figure

because wrapping with

MlflowArtifactDataset

is supposed to cover all use cases, but I am totally open to PR if you want to avoid dealing with local path manually.

Hugo Evers

06/05/2024, 9:08 AM

well its not so much the local path as storing my plots as html

27 Views

Open in Slack

Previous Next