Hugo Acosta
10/01/2024, 2:55 PMdata_{year}:
type: pandas.ExcelDataset
filepath: reports/folder/data_{year}.xlsx
save_args:
index: False
Then, I have another pipeline that aggregates all files to process them loading them as a PartitionedDataset, with entry:
partitioned_data:
type: partitions.PartitionedDataset
path: reports/folder
dataset:
type: pandas.ExcelDataset
The main problem with my approach is that even though these two entries refer to the same data, they are in fact different entries, so Kedro runs the second pipeline before the dynamic one.
I would appreciate your input on this issue,
Thanks a lot!Nok Lam Chan
10/01/2024, 3:10 PMThe main problem with my approach is that even though these two entries refer to the same data, they are in fact different entries, so Kedro runs the second pipeline before the dynamic one.Is it possible to use partition dataset instead of dynamic pipeline in this case? I understand the reason for this to happen is that, if you try to visualise this pipeline with
kedro viz
, it will be a disconnect one so Kedro don't know that the 1st one need to be executed before the other. The other option is to create a fake dummy input/output pair, to ensure the dependencies is resolved correctly.Hugo Acosta
10/01/2024, 3:54 PMNok Lam Chan
10/01/2024, 4:06 PMHugo Acosta
10/01/2024, 4:12 PMNok Lam Chan
10/01/2024, 4:14 PMNok Lam Chan
10/01/2024, 4:15 PM