https://kedro.org/ logo
#questions
Title
# questions
e

Emilio Gagliardi

09/13/2023, 9:59 PM
hi everyone, I have an easy question! 🙂 when I create a pandas CSV dataset, is it possible to append to the file or only write out the entire file? I just want to append a new row of data each time the pipeline runs, but currently it just overwrites the first line every time. I thought maybe I could set the mode="a" in catalog but that didn't seem to have an effect. I tried to load the csv by passing the dataset to the node in the pipeline inputs but that caused an error. what am I missing here? cheers!
d

datajoely

09/13/2023, 10:00 PM
Copy code
save_args:
   mode: a
does that not work?
e

Emilio Gagliardi

09/13/2023, 10:02 PM
I had
Copy code
save_args:
  mode: 'a'
I'll remove the quotes. I never know if I'm supposed to quote strings in yaml or not 🧐
no change, the csv file just contains a single row from the latest run.
Copy code
validation_metrics:
  type: pandas.CSVDataSet
  filepath: "data/08_reporting/validation_metrics.csv"
  load_args:

  save_args:
    index: False
    na_rep: "NaN"
    mode: a
and then in my node:
Copy code
new_data_dict = {...} <- a single dictionary
new_data_df = pd.DataFrame([new_data_dict])
return new_data_df
j

Juan Luis

09/14/2023, 8:41 AM
I tried this locally and it overrides the file for me too. @Emilio Gagliardi I'm opening an issue about this
@Emilio Gagliardi please upvote https://github.com/kedro-org/kedro-plugins/issues/336 (reacting with 👍🏼 ) and/or add any extra context that might help us prioritize the issue
e

Emilio Gagliardi

09/14/2023, 7:36 PM
Thank you, @Juan Luis! I just added a comment and a thumbs up. I haven't used an incremental dataset before, I'm not sure if it works. I just manually opened the CSV file and wrote to it.
2 Views