Good morning, I'm very much new to Kedro and mach...
# questions
e
Good morning, I'm very much new to Kedro and machine learning in general, so sorry if I say something stupid. I'm trying a new classification project where I want to use a very large dataset for the training. So I want the images to be loaded on the fly and managed as required by tensorflow to avoid saturating the RAM. I will as such create a tf.data pipeline. My dataset is currently made of the following: • metadata.csv : contains 2 columns : label + img_path (relative path from that file to the corresponsing .png file) • img/*.png : subfolder containing all the images In Kedro, I added the dataset in data/01_raw and created a new dataset in the catalog.yml pointing to the csv file using the pandas.CSVDataset loader. In my pipeline, I'm getting the dataset content and start creating the tf.data pipeline. I want to map a function (tf.map) that will load the image from the file using tf.io.read_file. But I have the problem that the path I have for one example is only relative to the metadata.csv file, so to load it I would probably need to make it absolute or something like that. So I'm wondering how I can then retreive from inside the kedro node the path to the metadata.csv file so I can add it to the image relative path. I'm thinking of adding a parameter but that is a bit stupid as it would duplicate the path location between the parameters.yml file and the catalog.yml file... Any better solution ? Or should I restructure my dataset differently ? Thanks for your opinions,
a
Hi @Edouard Charvet You might find
PartitionedDataset
useful for your use case - https://docs.kedro.org/en/stable/data/partitioned_and_incremental_datasets.html This is a tutorial on creating a custom
ImageDataset
which then is used with
PartitionedDataset
> (you don’t need to follow the creation of the ImageDataset part because it already exists, just to give you an idea about how to combine Image and Partitioned Dataset) https://docs.kedro.org/en/stable/data/how_to_create_a_custom_dataset.html#integration-with-partitioneddataset
e
Thanks for the feedback, I'll have a look into that