ladies amp gents any plan to make partitioned datasets compa Kedro #questions

Join Slack

ladies&gents, any plan to make partitioned dat...

# questions

Gauthier Pierard

04/01/2025, 12:14 PM

ladies&gents, any plan to make partitioned datasets compatible with versioning?

Nok Lam Chan

04/01/2025, 12:18 PM

IIRC, it support versioning already? https://github.com/kedro-org/kedro-plugins/pull/447

Gauthier Pierard

04/01/2025, 1:02 PM

@Nok Lam Chan any idea why I get this on kedro, version 0.19.12 ?

Copy code

DatasetError: 
PartitionedDataset.__init__() got an unexpected keyword argument 'version'.
Dataset 'usp.contextualized_deltav' must only contain arguments valid for the constructor of 
'kedro_datasets.partitions.partitioned_dataset.PartitionedDataset'.

catalog.yml:

Copy code

mynamespace.mydataset:
  versioned: true
  type: partitions.PartitionedDataset
  path: ${_data_prefix}/TEMP/PARTITION_TEST/
  dataset: pandas.ParquetDataset
  filename_suffix: ".parquet"
  overwrite: false
  credentials: adls_creds

Nok Lam Chan

04/01/2025, 1:24 PM

Nok Lam Chan

04/01/2025, 1:25 PM

I think the versioned key may goes under your

Dataset

, PartitionedDataset is just a wrapper so if the underlying dataset does not support verisoning by itself, PartitionDataset cannot support it too.

Nok Lam Chan

04/01/2025, 1:25 PM

Copy code

mynamespace.mydataset:

  type: partitions.PartitionedDataset
  path: ${_data_prefix}/TEMP/PARTITION_TEST/
  dataset: pandas.ParquetDataset
      versioned: true <- something like this I think
  filename_suffix: ".parquet"
  overwrite: false
  credentials: adls_creds

Gauthier Pierard

04/01/2025, 2:24 PM

thanks, that works. however not suited to my case where I'd like to save all files in one folder

Nok Lam Chan

04/01/2025, 2:32 PM

hmm

Nok Lam Chan

04/01/2025, 2:32 PM

What's the current behavior?

Nok Lam Chan

04/01/2025, 2:34 PM

@Deepyaman Datta do you remember this?

Gauthier Pierard

04/01/2025, 2:34 PM

each key of the partition has its own folder in which the timestamped versions exist (each one in a subfolder)

Gauthier Pierard

04/01/2025, 2:35 PM

by the way my current implementation is described here https://kedro-org.slack.com/archives/C03RKP2LW64/p1743172644361429 basically creating a dynamic dataset in one of the nodes

Deepyaman Datta

04/01/2025, 2:44 PM

It's been a while. https://github.com/kedro-org/kedro-plugins/pull/447 introduced versioning of the underlying dataset.

Deepyaman Datta

04/01/2025, 2:46 PM

So yes, what @Nok Lam Chan said; the key should be on the underlying dataset

👍 1

Deepyaman Datta

04/01/2025, 2:52 PM

If I understand @Gauthier Pierard you want to define partitions (and potentially how they evolve) for each version? I think that's very reasonable, but confusing to support in Kedro due to how partitions and versions are both defined by folder structure. It was a very long battle to even agree to get this way of supporting versioning in, while making assumptions. Your options are (1) create a custom dataset, take some assumptions or (2) look into using something like Iceberg for storage, which should define this behavior much more reasonably.

Gauthier Pierard

04/01/2025, 2:54 PM

Yes the users want all their files in one folder per run basically. Thanks for the suggestions

Deepyaman Datta

04/01/2025, 2:56 PM

Improving partitioned and incrementql dataset is something want to tackle at some point, but I don't know when that would be unfortunately (and it's not at all clear what that would mean) Cc @Juan Luis @Merel just in case you think it's worth recording from a priorities perspective :)

Open in Slack

Previous Next