Hi Folks! I have a `PartitionedDataSet` like this:...
# questions
a
Hi Folks! I have a
PartitionedDataSet
like this:
Copy code
scenario_x/
├── iter_1/
│   ├── run_1.csv
│   ├── run_2.csv
│   └── run_3.csv
└── iter_2/
    ├── run_1.csv
    ├── run_2.csv
    └── run_3.csv
scenario_y/
├── iter_1/
│   ├── run_1.csv
│   ├── run_2.csv
│   └── run_3.csv
└── iter_2/
    ├── run_1.csv
    ├── run_2.csv
    └── run_3.csv
The catalog entry is like this:
Copy code
_partitioned_csvs: &_partitioned_csvs
  type: PartitionedDataSet
  dataset:
    type: pandas.CSVDataSet
    load_args:
      index_col: 0
    save_args:
      index: true
  overwrite: true
  filename_suffix: ".csv"

_partitioned_jsons: &_partitioned_jsons
  type: PartitionedDataSet
  dataset:
    type: json.JSONDataSet
  filename_suffix: ".json"

my_csv_part_ds:
  path: data/07_model_output/my_csv_part_ds
  <<: *_partitioned_csvs

my_json_part_ds:
  path: data/07_model_output/my_json_part_ds
  <<: *_partitioned_jsons
When I run the pipeline, the csv partitioned dataset gets deleted first, and then new one gets written, but the json partitioned dataset remains, and new ones get added. I need a sort of a custom behaviour, wherein, the 2nd level of the partition should get overwritten, and not first level partition i.e. in the node which produces the partitioned csv, the return value is like this:
Copy code
def node_that_generates_part_ds(scenario, **kwargs):
  res = {'scenario_x/iter_1/run_1': df1, 'scenario_x/iter_1/run_2': df2,  .... and so on}}
  return res
so when return
res
keys contain scenario_x, scenario_y shoul NOT get deleted. Can anyone guide me on how can I achieve this? Thanks! 🙂