ladies&gents, any plan to make partitioned dat...
# questions
g
ladies&gents, any plan to make partitioned datasets compatible with versioning?
n
IIRC, it support versioning already? https://github.com/kedro-org/kedro-plugins/pull/447
g
@Nok Lam Chan any idea why I get this on kedro, version 0.19.12 ?
Copy code
DatasetError: 
PartitionedDataset.__init__() got an unexpected keyword argument 'version'.
Dataset 'usp.contextualized_deltav' must only contain arguments valid for the constructor of 
'kedro_datasets.partitions.partitioned_dataset.PartitionedDataset'.
catalog.yml:
Copy code
mynamespace.mydataset:
  versioned: true
  type: partitions.PartitionedDataset
  path: ${_data_prefix}/TEMP/PARTITION_TEST/
  dataset: pandas.ParquetDataset
  filename_suffix: ".parquet"
  overwrite: false
  credentials: adls_creds
n
hm
I think the versioned key may goes under your
Dataset
, PartitionedDataset is just a wrapper so if the underlying dataset does not support verisoning by itself, PartitionDataset cannot support it too.
Copy code
mynamespace.mydataset:

  type: partitions.PartitionedDataset
  path: ${_data_prefix}/TEMP/PARTITION_TEST/
  dataset: pandas.ParquetDataset
      versioned: true <- something like this I think
  filename_suffix: ".parquet"
  overwrite: false
  credentials: adls_creds
g
thanks, that works. however not suited to my case where I'd like to save all files in one folder
n
hmm
What's the current behavior?
@Deepyaman Datta do you remember this?
g
each key of the partition has its own folder in which the timestamped versions exist (each one in a subfolder)
by the way my current implementation is described here https://kedro-org.slack.com/archives/C03RKP2LW64/p1743172644361429 basically creating a dynamic dataset in one of the nodes
d
It's been a while. https://github.com/kedro-org/kedro-plugins/pull/447 introduced versioning of the underlying dataset.
So yes, what @Nok Lam Chan said; the key should be on the underlying dataset
👍 1
If I understand @Gauthier Pierard you want to define partitions (and potentially how they evolve) for each version? I think that's very reasonable, but confusing to support in Kedro due to how partitions and versions are both defined by folder structure. It was a very long battle to even agree to get this way of supporting versioning in, while making assumptions. Your options are (1) create a custom dataset, take some assumptions or (2) look into using something like Iceberg for storage, which should define this behavior much more reasonably.
g
Yes the users want all their files in one folder per run basically. Thanks for the suggestions
d
Improving partitioned and incrementql dataset is something want to tackle at some point, but I don't know when that would be unfortunately (and it's not at all clear what that would mean) Cc @Juan Luis @Merel just in case you think it's worth recording from a priorities perspective :)