Daniel Lee
07/18/2023, 8:42 AMDataCatalog
, I would like to pandas.ParquetDataset
to partition by the date in the dataset and save into different folders by date in parquet like how we can do it for spark.SparkDataSet
. Is there a way we could partition using pandas?Nok Lam Chan
07/18/2023, 10:22 AMParquetDataSet
, I advise using whatever Parquet offer because it’s a native implementation and often you get better predicate pushdown for performance.
Regarding to pandas, any Dataset that not offer partitioning can be partitioned with PartitionedDataSet
https://docs.kedro.org/en/stable/kedro.io.PartitionedDataset.html#kedro.io.PartitionedDataset