Hello to all of our wonderful `PartitionedDataset` users out Kedro #user-research

Hello to all of our wonderful `PartitionedDataset`...

Deepyaman Datta

10/22/2023, 1:32 PM

Hello to all of our wonderful

PartitionedDataset

users out there! We have a question for you, related to enabling versioning for

PartitionedDataset

--which of the below options makes the most sense to you? 1. https://github.com/kedro-org/kedro/pull/521 proposes to enable versioning of the underlying dataset, by specifying

versioned: true

in the dataset config:

Copy code

station_data:
  type: PartitionedDataset
  path: data/03_primary/station_data
  dataset:
    type: pandas.CSVDataset
    versioned: true

On the plus side, having the

versioned: true

config on the

dataset

config makes it clear that the versioning is applied to the underlying dataset, not to the

PartitionedDataset

. However, there are some edge cases (see https://github.com/kedro-org/kedro/pull/521#issuecomment-744653023, if you're keen). 2. Alternatively, we can move the

versioned: true

flag to the top level

PartitionedDataset

config:

Copy code

station_data:
  type: PartitionedDataset
  path: data/03_primary/station_data
  versioned: true
  dataset:
    type: pandas.CSVDataset

Note that the versioning is still of the underlying dataset (e.g.

data/03_primary/station_data/first_station.csv/<version>/first_station.csv

), even though the config is at the top level. 3. None of these options make sense; what you really need is versioning of the top-level dataset. (Note that we don't have a solution designed for this case, but it would be great to know nonetheless!) Please feel free to vote using 1️⃣2️⃣3️⃣, and elaborate further on your thoughts in the thread below!

2️⃣ 1

3️⃣ 1

1️⃣ 3

7 Views

Open in Slack

Previous Next