Thiago José Moser Poletto
12/18/2024, 3:06 PMmy_partitioned_dataset:
type: partitions.PartitionedDataset
path: data/02_intermediate # path to the location of partitions
dataset: pandas.CSVDatasetHall
12/18/2024, 3:06 PMRavi Kumar Pilla
12/18/2024, 3:45 PMThiago José Moser Poletto
12/18/2024, 4:17 PMRavi Kumar Pilla
12/18/2024, 4:23 PMRavi Kumar Pilla
12/18/2024, 4:25 PMThiago José Moser Poletto
12/18/2024, 7:35 PMThiago José Moser Poletto
12/18/2024, 7:38 PM%load_ext kedro.ipython
%reload_kedro ../
catalog.list()
[
'companies',
'historical_product_demand',
'my_partitioned_dataset',
'reviews',
'shuttles_excel',
'shuttles@csv',
'shuttles@spark',
'preprocessed_companies',
'preprocessed_shuttles',
'preprocessed_reviews',
'model_input_table@spark',
'model_input_table@pandas',
'regressor',
'metrics',
'companies_columns',
'shuttle_passenger_capacity_plot_exp',
'shuttle_passenger_capacity_plot_go',
'dummy_confusion_matrix',
'parameters',
'params:model_options',
'params:model_options.test_size',
'params:model_options.random_state',
'params:model_options.features'
]
my_partitioned_dataset = catalog.load('my_partitioned_dataset')Ravi Kumar Pilla
12/18/2024, 9:05 PMAnkita Katiyar
12/19/2024, 10:02 AMcatalog.load() it’ll be a Dict with the partition name and it’s corresponding load function. You can iterate over it to load the individual partitions -
my_partitioned_dataset = catalog.load('my_partitioned_dataset')
for file, func in my_partitioned_dataset.items():
data = func()Thiago José Moser Poletto
12/19/2024, 12:24 PM'.gitkeep': <bound method CSVDataset._load of kedro_datasets.pandas.csv_dataset.CSVDataset(filepath=PurePosixPath('/home/jupyter/demand-forecast-gcp-kedro/pdi-demand-forecast/data/02_intermediate/.gitkeep'), protocol='file', load_args={}, save_args={'index': False})>,Thiago José Moser Poletto
12/19/2024, 12:25 PMAnkita Katiyar
12/19/2024, 1:08 PMThiago José Moser Poletto
12/19/2024, 1:15 PMAnkita Katiyar
12/19/2024, 1:30 PM.gitkeep file in the data folders so they can be uploaded to GitHub, as Github doesn’t read empty folders. You can delete these files when the folders actually contain something. I’d also recommend creating a folder within 02_intermediate for the actual dataThiago José Moser Poletto
12/19/2024, 1:31 PM