Thiago José Moser Poletto
12/18/2024, 3:06 PMmy_partitioned_dataset:
type: partitions.PartitionedDataset
path: data/02_intermediate # path to the location of partitions
dataset: pandas.CSVDataset
Hall
12/18/2024, 3:06 PMRavi Kumar Pilla
12/18/2024, 3:45 PMThiago José Moser Poletto
12/18/2024, 4:17 PMRavi Kumar Pilla
12/18/2024, 4:23 PMRavi Kumar Pilla
12/18/2024, 4:25 PMThiago José Moser Poletto
12/18/2024, 7:35 PMThiago José Moser Poletto
12/18/2024, 7:38 PM%load_ext kedro.ipython
%reload_kedro ../
catalog.list()
[
'companies',
'historical_product_demand',
'my_partitioned_dataset',
'reviews',
'shuttles_excel',
'shuttles@csv',
'shuttles@spark',
'preprocessed_companies',
'preprocessed_shuttles',
'preprocessed_reviews',
'model_input_table@spark',
'model_input_table@pandas',
'regressor',
'metrics',
'companies_columns',
'shuttle_passenger_capacity_plot_exp',
'shuttle_passenger_capacity_plot_go',
'dummy_confusion_matrix',
'parameters',
'params:model_options',
'params:model_options.test_size',
'params:model_options.random_state',
'params:model_options.features'
]
my_partitioned_dataset = catalog.load('my_partitioned_dataset')
Ravi Kumar Pilla
12/18/2024, 9:05 PMAnkita Katiyar
12/19/2024, 10:02 AMcatalog.load()
it’ll be a Dict
with the partition name and it’s corresponding load function. You can iterate over it to load the individual partitions -
my_partitioned_dataset = catalog.load('my_partitioned_dataset')
for file, func in my_partitioned_dataset.items():
data = func()
Thiago José Moser Poletto
12/19/2024, 12:24 PM'.gitkeep': <bound method CSVDataset._load of kedro_datasets.pandas.csv_dataset.CSVDataset(filepath=PurePosixPath('/home/jupyter/demand-forecast-gcp-kedro/pdi-demand-forecast/data/02_intermediate/.gitkeep'), protocol='file', load_args={}, save_args={'index': False})>,
Thiago José Moser Poletto
12/19/2024, 12:25 PMAnkita Katiyar
12/19/2024, 1:08 PMThiago José Moser Poletto
12/19/2024, 1:15 PMAnkita Katiyar
12/19/2024, 1:30 PM.gitkeep
file in the data folders so they can be uploaded to GitHub, as Github doesn’t read empty folders. You can delete these files when the folders actually contain something. I’d also recommend creating a folder within 02_intermediate
for the actual dataThiago José Moser Poletto
12/19/2024, 1:31 PM