Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

Hey, I have quite a large model ( &gt; 0.5 GB) that is retrained very rarely and is located on ADLS (abfss). I would like to download it during the pipeline once, save it locally on the machine, and reuse it :arrow_right: *WITHOUT* :arrow_left: redownloading from the cloud during other pipeline runs. Unfortunately, as far as I know, and we have tested, it's not possible to achieve it with CachedDataSet. Is there any way I can save some time on this operation?

Expiring dataset.py

So I never contributed this, but a looooong time ago I built a `ExpiringPickleDataSet` that would cache results based on a time window or re-request it if it expired. Something like that is probably the right call.

This is reallly old code but may be helpful

k, thank you very much for this snippet.

So there's no official way of dealing with this problem in pure Kedro? We managed to create something similar, but I still believe it's not the way to go.

no but if yours is any good we’d love a contribution!