Brandon Meek
01/07/2023, 3:03 AMWilliam Caicedo
01/07/2023, 5:28 AMBrandon Meek
01/07/2023, 3:08 PMDeepyaman Datta
01/07/2023, 10:03 PMPartitionedDataSet
?pipeline
in a loop.Brandon Meek
01/07/2023, 10:11 PMPartitionedDataSet
is it is just a distributed dataset? Ultimately what I'd like to do is write a parameter like:
Datasets:
- Ds1
- Ds2
- DS3
...
And pass that to a pipeline and it will use all three of those datasets, one example would be to merge all of the datasets provided on a specific keyDeepyaman Datta
01/08/2023, 12:00 AMPartitionedDataSet
isn't really distributed. If DS1, DS2, DS3 are all under the same path:
path/to/pds/ds1.csv
path/to/pds/ds2.csv
path/to/pds/ds3.csv
And you define a catalog entry:
my_pds:
type: PartitionedDataSet
dataset:
type: pandas.CSVDataSet
path: path/to/pds
filename_suffix: .csv
You can use my_pds
as a node input and iterate over the 3 dataframes in there.Ian Anderson
01/08/2023, 5:10 PMBrandon Meek
01/08/2023, 5:11 PMIan Anderson
01/08/2023, 5:17 PM