Hey, I have quite a large model ( > 0.5 GB) tha...
# questions
d
Hey, I have quite a large model ( > 0.5 GB) that is retrained very rarely and is located on ADLS (abfss). I would like to download it during the pipeline once, save it locally on the machine, and reuse it ➡️ WITHOUT ⬅️ redownloading from the cloud during other pipeline runs. Unfortunately, as far as I know, and we have tested, it's not possible to achieve it with CachedDataSet. Is there any way I can save some time on this operation?
d
Expiring dataset.py
🧐 1
So I never contributed this, but a looooong time ago I built a
ExpiringPickleDataSet
that would cache results based on a time window or re-request it if it expired. Something like that is probably the right call. This is reallly old code but may be helpful
d
k, thank you very much for this snippet. So there's no official way of dealing with this problem in pure Kedro? We managed to create something similar, but I still believe it's not the way to go.
d
no but if yours is any good we’d love a contribution!