Nicolas Betancourt Cardona
10/02/2024, 3:17 PMPartitionedDataset
catalog entries in catalog.yml
such as
audio_folder:
type: partitions.PartitionedDataset
dataset: my_kedro_project.datasets.audio_dataset.SoundDataset
path: data/output/audios/
filename_suffix: ".WAV"
The next level of abstraction I would require is to be able to create a catalog entry corresponding to a folder containig folders such as the audio_folder
above. Here is my try to do so but I'm having an issue with the _save
method
class AudioFolderDataset(PartitionedDataset):
def __init__(self, main_folder_path: str):
"""Creates a new instance of SoundDataset to load / save audio data for given filepath.
Args:
filepath: The location of the audio file to load / save data.
"""
protocol, mainfolderpath = get_protocol_and_path(main_folder_path)
self._protocol = protocol
self._mainfolderpath = PurePosixPath(mainfolderpath)
self._fs = fsspec.filesystem(self._protocol)
def _load(self,subfolders_dictionary):
# loading code
.
def _save(self, subfolders_dictionary):
os.path.normpath(self._mainfolderpath)
for subfolder_name in subfolders_dictionary.keys():
subfolder_path=os.path.join(self._mainfolderpath, subfolder_name)
partitioned_dataset = PartitionedDataset(
path=subfolder_path,
dataset=SoundDataset,
filename_suffix=".WAV",
)
partitioned_dataset.save(subfolders_dictionary[subfolder_name])
partitioned_dataset.save(subfolders_dictionary[subfolder_name])
def _describe(self):
# describe code
The problem is I'm working on windows but it seems that PartitionedDataset
assumes that my system separator is /
instead of \
. When I print the path in _save
method in SoundDataset
class I get folder\\subfolder/file.WAV
which off course os leading to an error.
Is there a way in which I can change this default behaviour?Nok Lam Chan
10/02/2024, 5:23 PMNok Lam Chan
10/02/2024, 5:25 PMpathlib.Path
you should be handle these path properly regardless of your OS.Nok Lam Chan
10/02/2024, 5:27 PMPartitionedDataset
at the same time.
I would approach this differently, since you mentioned a folder of files is consider as a single "Dataset".
1. Keep PartitionedDataset if it's flexible enough for you, other wise extend it to iterate folders however you need
2. Implement your own AudioDataset, that load a single folder as a data.Nicolas Betancourt Cardona
10/02/2024, 6:57 PMAbstractDataset
to load and save folders of foldes using dictionaries of dictionaries did what I needed. Thank you for your helpNok Lam Chan
10/02/2024, 7:28 PMNicolas Betancourt Cardona
10/02/2024, 8:05 PM