Not exactly a support question but for people who use have c Kedro #questions

Not exactly a support question, but for people who...

Deepyaman Datta

01/09/2023, 2:55 PM

Not exactly a support question, but for people who use/have considered using

PartitionedDataSet

... Let's say I have a catalog entry like:

Copy code

my_pds:
  type: PartitionedDataSet
  path: data/01_raw/subjects
  dataset:
    type: my_project.io.MyCustomDataSet

And data like:

Copy code

data/01_raw/subjects/C001/scans/0.png
data/01_raw/subjects/C001/scans/1.png
data/01_raw/subjects/C001/scans/2.png
data/01_raw/subjects/C001/test_results.csv
data/01_raw/subjects/C001/notes.png
data/01_raw/subjects/C002/scans/0.png
data/01_raw/subjects/C002/scans/1.png
data/01_raw/subjects/C002/scans/2.png
data/01_raw/subjects/C002/test_results.csv
data/01_raw/subjects/C002/notes.png
data/01_raw/subjects/T001/scans/0.png
data/01_raw/subjects/T001/scans/1.png
data/01_raw/subjects/T001/scans/2.png
data/01_raw/subjects/T001/test_results.csv
data/01_raw/subjects/T001/notes.png

What do you think the resulting partitions would be?

Jordan

01/09/2023, 11:28 PM

This is something I’ve questioned as well when I have data nested at different levels. I think in my case I ended up avoiding the problem by putting everything in the same working directory and placing the folder info into the file names themselves. But in my case, all the data was the same type

Deepyaman Datta

01/10/2023, 1:37 AM

I was curious because I came across this while helping somebody yesterday, and the behavior was different than I intuitively expected, despite having used

PartitionedDataSet

on several occasions in the past. What happens is that every file under there (at any level) becomes a partition--a result of finding every file under there recursively--rather than using the top-level file or folder as the partition key. I kinda expected the latter, but may just be me.

K 1

Open in Slack

Previous Next