Deepyaman Datta
01/09/2023, 2:55 PMPartitionedDataSet
...
Let's say I have a catalog entry like:
my_pds:
type: PartitionedDataSet
path: data/01_raw/subjects
dataset:
type: my_project.io.MyCustomDataSet
And data like:
data/01_raw/subjects/C001/scans/0.png
data/01_raw/subjects/C001/scans/1.png
data/01_raw/subjects/C001/scans/2.png
data/01_raw/subjects/C001/test_results.csv
data/01_raw/subjects/C001/notes.png
data/01_raw/subjects/C002/scans/0.png
data/01_raw/subjects/C002/scans/1.png
data/01_raw/subjects/C002/scans/2.png
data/01_raw/subjects/C002/test_results.csv
data/01_raw/subjects/C002/notes.png
data/01_raw/subjects/T001/scans/0.png
data/01_raw/subjects/T001/scans/1.png
data/01_raw/subjects/T001/scans/2.png
data/01_raw/subjects/T001/test_results.csv
data/01_raw/subjects/T001/notes.png
What do you think the resulting partitions would be?Jordan
01/09/2023, 11:28 PMDeepyaman Datta
01/10/2023, 1:37 AMPartitionedDataSet
on several occasions in the past. What happens is that every file under there (at any level) becomes a partition--a result of finding every file under there recursively--rather than using the top-level file or folder as the partition key. I kinda expected the latter, but may just be me.