hi everyone I have an easy question slightly smiling face wh Kedro #questions

hi everyone, I have an easy question! :slightly_sm...

Emilio Gagliardi

09/13/2023, 9:59 PM

hi everyone, I have an easy question! 🙂 when I create a pandas CSV dataset, is it possible to append to the file or only write out the entire file? I just want to append a new row of data each time the pipeline runs, but currently it just overwrites the first line every time. I thought maybe I could set the mode="a" in catalog but that didn't seem to have an effect. I tried to load the csv by passing the dataset to the node in the pipeline inputs but that caused an error. what am I missing here? cheers!

datajoely

09/13/2023, 10:00 PM

Copy code

save_args:
   mode: a

does that not work?

Emilio Gagliardi

09/13/2023, 10:02 PM

I had

Copy code

save_args:
  mode: 'a'

I'll remove the quotes. I never know if I'm supposed to quote strings in yaml or not 🧐

Emilio Gagliardi

09/13/2023, 10:08 PM

no change, the csv file just contains a single row from the latest run.

Copy code

validation_metrics:
  type: pandas.CSVDataSet
  filepath: "data/08_reporting/validation_metrics.csv"
  load_args:

  save_args:
    index: False
    na_rep: "NaN"
    mode: a

and then in my node:

Copy code

new_data_dict = {...} <- a single dictionary
new_data_df = pd.DataFrame([new_data_dict])
return new_data_df

Juan Luis

09/14/2023, 8:41 AM

I tried this locally and it overrides the file for me too. @Emilio Gagliardi I'm opening an issue about this

Juan Luis

09/14/2023, 8:47 AM

@Emilio Gagliardi please upvote https://github.com/kedro-org/kedro-plugins/issues/336 (reacting with 👍🏼 ) and/or add any extra context that might help us prioritize the issue

Juan Luis

09/14/2023, 8:51 AM

in the meantime, do Incremental datasets https://docs.kedro.org/en/stable/data/partitioned_and_incremental_datasets.html#incremental-datasets suit your needs?

Emilio Gagliardi

09/14/2023, 7:36 PM

Thank you, @Juan Luis! I just added a comment and a thumbs up. I haven't used an incremental dataset before, I'm not sure if it works. I just manually opened the CSV file and wrote to it.

Kevin Hauskins

01/13/2025, 7:59 PM

Hi @Juan Luis and @datajoely, do you know if this issue was resolved? I've tried various combinations of

save_args

and

fs_args

but it always seems to overwrite the file. The github issue above was closed.

Copy code

results:
  type: pandas.CSVDataset
  filepath: data/08_reporting/results.csv
  save_args:
    index: False
    mode: a
  fs_args:
    open_args_save:
      mode: a

Thanks!

Ravi Kumar Pilla

01/13/2025, 8:28 PM

Hi @Kevin Hauskins, The issue seems to be resolved. Please have a look at this PR for the fix details

Kevin Hauskins

01/13/2025, 8:31 PM

Hi @Ravi Kumar Pilla. I read through that but file is still be overwritten instead of appended to. It wasn't entire clear where to include the append argument but I tried several of the examples without success (including using 'a' 'ab', etc.)

👀 1

Ravi Kumar Pilla

01/13/2025, 8:38 PM

Can you please confirm the kedro-datasets version you have ?

Ravi Kumar Pilla

01/13/2025, 8:50 PM

I tested locally (kedro-datasets 6.0.0) and it works with -

Copy code

fs_args:
    open_args_save:
      mode: a

👍 1

Kevin Hauskins

01/13/2025, 11:05 PM

thanks! I had an earlier version of kedro-datasets.

2 Views

Open in Slack

Previous Next