Hello I have a problem when using CSVDataSet and d...
# questions
d
Hello I have a problem when using CSVDataSet and downloading data from Azure (abfs protocol) - after 8h the operation is canceled due to timeout. This does not happen every time, but I would like to avoid such a situation. Had anyone had this kind of problem? Maybe there is a solution other than a try-catch implementation?
d
If this is the case I would think about subclassing the dataset and building in a retry
m
Do you have a CSV so large that it takes >8h to read it?
d
oh I misread that!
I thought it said 8th!
good point!
d
Dataset has only 37.6 MB, so it isn't so large.
d
yeah this should take milliseconds to complete
Something I like to do to prototype these sort of things: ā€¢ open a notebook ā€¢ import
from kedro.datasets.pandas import CSVDataSet
ā€¢ Construct the dataset there to test the right configuration All Kedro does behind the scenes is pass your YAML config to
importlib
so this is a really nice way to tinker with the config quickly
šŸ‘ 1
m
I would verify the networking and configuration (of the execution environment, not Kedro) first; then plain python with fsspec then Kedro
šŸ‘ 1
d
Okey, thanks šŸ™‚