Marc Gris
11/23/2023, 7:50 AMPartitionedDataset
of ImageDataset
to handle the classic use-case of having thousands of images to download from different locations and then have those centralized in S3.
I first went with the “eager” option and created a node that returned a dict mapping from filename to Image
I quickly realized that this was not ideal, since, in case of failure for a single download, the node & pipeline would fail, and all the images that were successfully downloaded were lost.
I therefore switched to the “lazy” option, and created a node that returned a dict mapping from filename to a callable
that would return an Image
.
But here again, I am facing an issue: In case of failure, the pipeline crashes with a DatasetError: Saving 'None' to a 'Dataset' is not allowed
.
Granted… All there is a silver lining: What has been downloaded is not lost… But still… Not great.
So… What is the “proper” / “kedro-ic” way of doing such tasks ?
Many thanks in advance for your help,
Regards,
MarcNok Lam Chan
11/23/2023, 8:08 AMMarc Gris
11/23/2023, 8:25 AMNok Lam Chan
11/23/2023, 8:30 AMPartitionedDataset
or ImageDataSet
Marc Gris
11/23/2023, 8:34 AM