https://kedro.org/ logo
#questions
Title
# questions
a

Ankar Yadav

02/06/2023, 12:19 PM
Hi Team, one quick questions: I am using pandas.CSVDataset to save a file however when I mention
sep
in save_args, it gives me an error:
Copy code
prm_customer:
  type: pandas.CSVDataSet
  filepath: ${base_path}/${folders.prm}/
  save_args:
    index: False
    sep: "|"
Any idea how to fix this? I am using
kedro 0.18.1
n

Nok Lam Chan

02/06/2023, 12:28 PM
What error did you get?
a

Ankar Yadav

02/06/2023, 12:45 PM
Error tokenizing data. C error: Expected 1 fields in line 3, saw 2
d

datajoely

02/06/2023, 1:04 PM
that’s just a badly formatted csv
you need to provide an error handling strategy
on_bad_lines: _{'error', 'warn', 'skip'}_
a

Ankar Yadav

02/06/2023, 1:16 PM
We have a column with address info which has
,
, thats why I need a different seperator to save files
d

datajoely

02/06/2023, 1:17 PM
its not really a kedro issue it’s a pandas configuration issue on how to deal with a bad delimiter, refer to to
read_csv
or
to_csv
docs, all we’re doing is passing through the arguments there?
👍 1
a

Ankar Yadav

02/06/2023, 1:20 PM
to_csv
has a
sep
feild,
sep
is supported with
load_args
but for some reason when I use it with
save_args
it doesnt work
d

datajoely

02/06/2023, 1:20 PM
it’s not that arg that’s failing
you’re getting a tokenizer issue cos pandas and the underlying C code can’t write at least one of your lines
perhaps adding strict quoting will help
a

Ankar Yadav

02/06/2023, 1:21 PM
Let me try that, thanks
n

Nok Lam Chan

02/06/2023, 2:11 PM
if you do
<http://df.to|df.to>_csv(index=False. sep="|")
, I expect it will throw the same error.
Are you sure this is being thrown when it’s writing data? or read\
d

datajoely

02/06/2023, 2:12 PM
or use a real format like parquet 😛
a

Ankar Yadav

02/06/2023, 2:13 PM
ya parquet will be the best thing
@Nok Lam Chan currently pandas offer
sep
in
to_csv
, but not sure if its available in the pandas version used with kedro 0.18.1
s

Sebastian Pehle

02/06/2023, 5:13 PM
I 2nd Noks question: Is it reading or writing the dataset? Have encountered this error with read_csv only. And have you tried saving 'dirty' inside the node with regular to_csv?
👍 1
4 Views