# questions

Edouard Charvet

10/20/2023, 5:27 AM
Good morning, I'm very much new to Kedro and machine learning in general, so sorry if I say something stupid. I'm trying a new classification project where I want to use a very large dataset for the training. So I want the images to be loaded on the fly and managed as required by tensorflow to avoid saturating the RAM. I will as such create a pipeline. My dataset is currently made of the following: • metadata.csv : contains 2 columns : label + img_path (relative path from that file to the corresponsing .png file) • img/*.png : subfolder containing all the images In Kedro, I added the dataset in data/01_raw and created a new dataset in the catalog.yml pointing to the csv file using the pandas.CSVDataset loader. In my pipeline, I'm getting the dataset content and start creating the pipeline. I want to map a function ( that will load the image from the file using But I have the problem that the path I have for one example is only relative to the metadata.csv file, so to load it I would probably need to make it absolute or something like that. So I'm wondering how I can then retreive from inside the kedro node the path to the metadata.csv file so I can add it to the image relative path. I'm thinking of adding a parameter but that is a bit stupid as it would duplicate the path location between the parameters.yml file and the catalog.yml file... Any better solution ? Or should I restructure my dataset differently ? Thanks for your opinions,

Ankita Katiyar

10/23/2023, 8:28 AM
Hi @Edouard Charvet You might find
useful for your use case - This is a tutorial on creating a custom
which then is used with
> (you don’t need to follow the creation of the ImageDataset part because it already exists, just to give you an idea about how to combine Image and Partitioned Dataset)

Edouard Charvet

10/23/2023, 10:28 AM
Thanks for the feedback, I'll have a look into that