Hi all I have a question possibly more about s3fs than kedro Kedro #questions

Hi all, I have a question possibly more about s3fs...

viveca

04/12/2023, 3:45 PM

Hi all, I have a question possibly more about s3fs than kedro, but at least it’s related 🙂 I’m trying to export a plotly image to s3 using

fig.write_html()

. So I made a simple custom dataset

Copy code

class PlotlyHTMLDataSet(JSONDataSet):
    """Export plotly figure to html"""

    def _save(self, data: go.Figure) -> None:
        save_path = get_filepath_str(self._get_save_path(), self._protocol)

        with self._fs.open(save_path, **self._fs_open_args_save) as fs_file:
            data.write_html(fs_file, **self._save_args)

        self._invalidate_cache()

This worked fine… except the content-type of the html file on s3 “ends up” being “binary/octet-stream”, but should be “text/html”. This becomes a problem when trying to display this in a browser. Anyone got experience of args you could pass here to manually set the content type? Not my area of expertise. Thanks, Viveca

datajoely

04/12/2023, 4:26 PM

In your

self._fs.open(…

try adding

mode: 'wt'

so you’re writing text not binary

datajoely

04/12/2023, 4:26 PM

https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.open

viveca

04/12/2023, 4:54 PM

thanks for the suggestion. I’ll try!

viveca

04/12/2023, 4:59 PM

unfortunately gave the same content type on s3 🤷‍♀️

viveca

04/13/2023, 6:48 AM

Turns out btw that the same thing happens for pngs using the kedro Matplotlibwriter with destination s3: it has content-type binary/octet-stream.

datajoely

04/13/2023, 7:57 AM

so the only two paramters I can think to tweak here are

mode

and

encoding

(which only applies to text)

datajoely

04/13/2023, 7:57 AM

so perhaps passing

utf-8

to encoding is what you need to coerce it in to a text format

viveca

05/05/2023, 3:59 PM

I’m slow on the response here 🙂 Actually it turns out this is not specific to the custom html dataset I had above. I get the same issue when putting a png on s3 using the matplotlibwriter dataset (it doesn’t get the right type, in this case something like “image/png”).

Open in Slack

Previous Next