At the end of my data science pipeline I need to s...
# questions
j
At the end of my data science pipeline I need to save multiple plots. The number of plots depends on hyperparameters of the model and there could be around 5-30 plots. How would I do this with Kedro? I took a look at https://kedro.readthedocs.io/en/stable/kedro.extras.datasets.matplotlib.MatplotlibWriter.html. However, there is only one example using YAML api (which I think I need to use to be able to see pictures when looking at my experiments through kedro viz) and in that example only one plot is saved. There are also examples where a list of plots is saved but there the python api is used and with the python api approach I can't figure out how to get the list of images be displayed in my experiments section in Kedro viz.
i
For the first part, you could implement your own dataset that saves a list of matplotlib figures to a directory using
MatplotlibWriter
in a loop As far as displaying in kedro viz, I’m not sure
d
Both saving a list of plots or a single plots can be done via the YAML API; the difference is whether you return a list or not from the node that saves to the MatplotlibWriter dataset.
If you return a list from the node, how does it look in Kedro-Viz?
j
@Deepyaman Datta This the example from https://kedro.readthedocs.io/en/stable/kedro.extras.datasets.matplotlib.MatplotlibWriter.html to save one plot:
Copy code
output_plot:
  type: matplotlib.MatplotlibWriter
  filepath: data/08_reporting/output_plot.png
  save_args:
    format: png
I can't figure out how to modify the yaml in the case where there is a list of plots that we want to save. What should be the filepath argument, for example?
d
If you look at the implementation under https://kedro.readthedocs.io/en/stable/_modules/kedro/extras/datasets/matplotlib/matplotlib_writer.html#MatplotlibWriter, you'll see that it will write to
data/08_reporting/output_plot.png/0.png
,
data/08_reporting/output_plot.png/1.png
, etc. If you want to control the names, you can return a dictionary instead of a list. Ideally, seeing the filepaths above, you would want to specify a directory-like name (rather than a filename) as the
filepath
argument. Relevant snippet from that link:
Copy code
if isinstance(data, list):
            for index, plot in enumerate(data):
                full_key_path = get_filepath_str(
                    save_path / f"{index}.png", self._protocol
                )
                self._save_to_fs(full_key_path=full_key_path, plot=plot)
(Of course, it would also be better if this were more clearly documented, and you didn't have to understand the implementation, but just trying to help for now)
🥳 2
j
Awesome, thanks!
@Deepyaman Datta I am still getting back to this. I have now the following dataset entry in my catalog:
Copy code
output_plot:
  type: matplotlib.MatplotlibWriter
  filepath: data/08_reporting/output_plots
  save_args:
    format: png
  versioned: true
If I have my pipeline return a single plot everything works fine and I can see the plot in the experiments section in kedro viz. However, if I return a list of plots the plots are still created but I can't see them through kedro viz. Also, I see the following warning displayed in the terminal where kedro viz was started:
Copy code
'output_plot' with version '2022-12-22T19.00.19.079Z' could not be loaded. Full exception: DataSetError: Failed while loading data from data set    experiment_tracking.py:101
                             MatplotlibWriter(filepath=[my project path]/data/08_reporting/output_plots, protocol=file, save_args={'format': png},                                         
                             version=Version(load='2022-12-22T19.00.19.079Z', save=None)).                                                                                                                 
                             [Errno 21] Is a directory: '[my project path]/data/08_reporting/output_plots/2022-12-22T19.00.19.079Z/output_plots'
Is this as designed? Would be nice to see list of plots displayed by kedro viz as well.
o
@Steeve Ndjila same situation we are having!
d
To be honest, I don't know much on the Viz side, but I'm guessing (without looking at the code) it just wasn't written with the use case of handling a directory of paths in mind. @Tynan or @Rashida Kanchwala or somebody else may know better, although I think most of the core team is off. Let me see if I can take a look at the code for my own knowledge later.
t
@Jaakko which version of Kedro are you using?
j
@Tynan kedro version 0.18.3
kedro-viz 5.1.1
t
thanks. @Deepyaman Datta is right, Viz isn't written to handle this use case. what we handle is one plot per metric per run, not multiple plots