Hugo Barreto
01/17/2025, 7:02 PM"{company}.{layer}.transactions":
type: pandas.ParquetDataset
filepath: data/{company}/{layer}/transactions
save_args:
partition_cols: [year, month]
The error:
DatasetError: ParquetDataset does not support save argument 'partition_cols'. Please use '<http://kedro.io|kedro.io>.PartitionedDataset' instead.
How am I supposed to do it using PartitionedDatasets and what is the reason behind blocking the use of partition_cols in pandas.ParquetDataset (I'm asking because i could just override it with a custom Dataset)?Hall
01/17/2025, 7:02 PMRavi Kumar Pilla
01/17/2025, 7:19 PMpartition_cols
is not supported. May be @Nok Lam Chan or someone has a better idea as this has been around from the start. You can do this using PartitionedDatasets as mentioned here and use the
arg dataset: pandas.ParquetDataset
as the underlying dataset. Thank youHugo Barreto
01/17/2025, 7:24 PMNok Lam Chan
01/17/2025, 7:36 PMNok Lam Chan
01/17/2025, 7:36 PMNok Lam Chan
01/17/2025, 7:37 PMfast_parquet
engine. These day the default is pyarrow
and it is supported by pandas out of the box.Nok Lam Chan
01/17/2025, 7:38 PM