Benjamin Cheung
06/30/2024, 6:55 PMYury Fedotov
07/01/2024, 1:51 AMPartitionedDataset (docs).
In short, how it works is that you define it like this:
my_partitioned_dataset:
type: partitions.PartitionedDataset
path: <s3://my-bucket-name/path/to/folder> # path to the location of partitions
dataset: pandas.CSVDataset # shorthand notation for the dataset which will handle individual partitions
And that means:
1. Go to the folder specified in path
2. Read all items as individual datasets (in this case pandas.CSVDataset )
3. On load, it would return a dict[str, object] thing where str is a filename and object is whatever your dataset would read - in example above it would be pd.DataFrame.Richard Purvis
07/01/2024, 8:17 PM