I’d like to log figures in kedro-mlflow through th...
# plugins-integrations
h
I’d like to log figures in kedro-mlflow through the mlflow log_figure method, i saw it was mentioned in a discussion from quite some time ago, however never implemented. I gather i can do something like:
Copy code
dependancy_figure:
    type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
    dataset:
        type: kedro_datasets.plotly.JSONDataset
        filepath: ....
However when testing whether this would give the same result as
mlflow.log_figure()
i got issues with kedro assuming the dataset is versioned while the s3 bucket is not. Anyway, I’d propose to make a dataset to do this. Would the authors be open to a PR? And if so, do you have opinions on the implementation/naming, should i subclass the MlflowArtifactDataset, or another class?
n
However when testing whether this would give the same result as
mlflow.log_figure()
i got issues with kedro assuming the dataset is versioned while the s3 bucket is not.
Can you show the error?
this 1
h
catalog.yml:
Copy code
shap_dependency_figure:
    type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
    dataset:
        type: kedro_datasets.plotly.JSONDataset
        filepath: <s3://aw-science/performance_optimisation/${oc.env:ENV}/${oc.env:CLIENT_NAME}/data/08_reporting/shap_dependency.json>
code:
Copy code
import plotly.express as px

fig = px.bar(x=["a", "b", "c"], y=[1, 3, 2])

catalog.save("shap_dependency_figure",fig)
error:
Copy code
DatasetError: Cannot save versioned dataset 'shap_dependency.json' to 
'aw-science/performance_optimisation/dev/Reed/data/08_reporting' because a file with the same name already exists 
in the directory. This is likely because versioning was enabled on a dataset already saved previously. Either 
remove 'shap_dependency.json' from the directory or manually convert it into a versioned dataset by placing it in a
versioned directory (e.g. with default versioning format 
'aw-science/performance_optimisation/dev/Reed/data/08_reporting/shap_dependency.json/YYYY-MM-DDThh.mm.ss.sssZ/shap_
dependency.json').
i cant paste the entire traceback, but in the middle the error indicates that the local file specified for logging as an artifact does not exist. The issue arises during the attempt to upload the file to S3 via MLflow.
so funny enough, if i do this:
Copy code
shap_dependency_figure:
    type: kedro_datasets.plotly.JSONDataset
    filepath: <s3://aw-science/performance_optimisation/${oc.env:ENV}/${oc.env:CLIENT_NAME}/data/08_reporting/shap_dependency.json>
it saves just fine
and of course when i do this:
Copy code
mlflow.log_figure(fig,"shap_dependency_figure.html")
also, it logs just fine, and i can view the plot in mlflow
j
oh this error is typical when you've used a versioned dataset a minute ago, then you modify the config, then it complains
if you can, try doing
rm -r aw-science/performance_optimisation/dev/Reed/data/08_reporting
(or make a backup somewhere else)
h
hmm, but none of those files are versioned, there are versioned files in the bucket though
but not that folder
but so, would it be interesting to create an mlflow dataset that performs
log_figure
? Or does the mlflow artefact dataset do the same thing?
# must be a local file, wherever you want to log the data in the end
h
ahh okay, i just wrote the plot to a local directory:
Copy code
shap_dependency_figure:
    type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
    dataset:
        type: kedro_datasets.plotly.JSONDataset
        filepath: data/08_reporting/shap_dependency.html
The thing is, regardless of the file handle, (html, json, etc) in mlflow you will see json and not a figure. Which makes sense because the underlying dataset is saving json
j
so using
filepath: s3
with
MlflowArtifactDataset
isn't supported
The thing is, regardless of the file handle, (html, json, etc) in mlflow you will see json and not a figure.
Which makes sense because the underlying dataset is saving json
yup I see it now. pinging @Yolan Honoré-Rougé but maybe it's better that you continue in that discussion you pointed out https://github.com/Galileo-Galilei/kedro-mlflow/discussions/338
(or open a new discussion/issue)
h
But what i want is the same outcome as mlflow.log_figure, so either that means adding a plotly dataset that saves figures to html, and then using the
<http://kedro_mlflow.io|kedro_mlflow.io>.artifacts.MlflowArtifactDataset
or adding a MlflowFigure dataset
my personal preference would be the latter because then i dont have to deal with the save locations in production
(because we render huge dashboards in html and then save those, so saving locally could cause issues, saving directly to s3 would not)
y
Sorry I forgot to answer to this thread. I indeed saw little value to add a dataset around
log_figure
because wrapping with
MlflowArtifactDataset
is supposed to cover all use cases, but I am totally open to PR if you want to avoid dealing with local path manually.
h
well its not so much the local path as storing my plots as html