https://kedro.org/ logo
#questions
Title
# questions
d

Damian Fiłonowicz

01/10/2023, 9:43 AM
Hey, I have quite a large model ( > 0.5 GB) that is retrained very rarely and is located on ADLS (abfss). I would like to download it during the pipeline once, save it locally on the machine, and reuse it ➡️ WITHOUT ⬅️ redownloading from the cloud during other pipeline runs. Unfortunately, as far as I know, and we have tested, it's not possible to achieve it with CachedDataSet. Is there any way I can save some time on this operation?
d

datajoely

01/10/2023, 10:01 AM
Expiring dataset.py
🧐 1
So I never contributed this, but a looooong time ago I built a
ExpiringPickleDataSet
that would cache results based on a time window or re-request it if it expired. Something like that is probably the right call. This is reallly old code but may be helpful
d

Damian Fiłonowicz

01/10/2023, 1:08 PM
k, thank you very much for this snippet. So there's no official way of dealing with this problem in pure Kedro? We managed to create something similar, but I still believe it's not the way to go.
d

datajoely

01/10/2023, 3:33 PM
no but if yours is any good we’d love a contribution!
3 Views