Hello I have a problem when using CSVDataSet and downloading Kedro #questions

Hello I have a problem when using CSVDataSet and d...

Dawid Bugajny

10/13/2023, 8:44 AM

Hello I have a problem when using CSVDataSet and downloading data from Azure (abfs protocol) - after 8h the operation is canceled due to timeout. This does not happen every time, but I would like to avoid such a situation. Had anyone had this kind of problem? Maybe there is a solution other than a try-catch implementation?

datajoely

10/13/2023, 8:57 AM

If this is the case I would think about subclassing the dataset and building in a retry

marrrcin

10/13/2023, 9:00 AM

Do you have a CSV so large that it takes >8h to read it?

datajoely

10/13/2023, 9:00 AM

oh I misread that!

datajoely

10/13/2023, 9:00 AM

I thought it said 8th!

datajoely

10/13/2023, 9:00 AM

good point!

Dawid Bugajny

10/13/2023, 9:07 AM

Dataset has only 37.6 MB, so it isn't so large.

datajoely

10/13/2023, 9:34 AM

yeah this should take milliseconds to complete

datajoely

10/13/2023, 9:35 AM

Something I like to do to prototype these sort of things: • open a notebook • import

from kedro.datasets.pandas import CSVDataSet

• Construct the dataset there to test the right configuration All Kedro does behind the scenes is pass your YAML config to

importlib

so this is a really nice way to tinker with the config quickly

👍 1

marrrcin

10/13/2023, 9:42 AM

I would verify the networking and configuration (of the execution environment, not Kedro) first; then plain python with fsspec then Kedro

👍 1

Dawid Bugajny

10/13/2023, 9:45 AM

Okey, thanks 🙂

Open in Slack

Previous Next