https://kedro.org/ logo
#questions
Title
# questions
d

Dawid Bugajny

10/13/2023, 8:44 AM
Hello I have a problem when using CSVDataSet and downloading data from Azure (abfs protocol) - after 8h the operation is canceled due to timeout. This does not happen every time, but I would like to avoid such a situation. Had anyone had this kind of problem? Maybe there is a solution other than a try-catch implementation?
d

datajoely

10/13/2023, 8:57 AM
If this is the case I would think about subclassing the dataset and building in a retry
m

marrrcin

10/13/2023, 9:00 AM
Do you have a CSV so large that it takes >8h to read it?
d

datajoely

10/13/2023, 9:00 AM
oh I misread that!
I thought it said 8th!
good point!
d

Dawid Bugajny

10/13/2023, 9:07 AM
Dataset has only 37.6 MB, so it isn't so large.
d

datajoely

10/13/2023, 9:34 AM
yeah this should take milliseconds to complete
Something I like to do to prototype these sort of things: • open a notebook • import
from kedro.datasets.pandas import CSVDataSet
• Construct the dataset there to test the right configuration All Kedro does behind the scenes is pass your YAML config to
importlib
so this is a really nice way to tinker with the config quickly
šŸ‘ 1
m

marrrcin

10/13/2023, 9:42 AM
I would verify the networking and configuration (of the execution environment, not Kedro) first; then plain python with fsspec then Kedro
šŸ‘ 1
d

Dawid Bugajny

10/13/2023, 9:45 AM
Okey, thanks šŸ™‚