Hi Team, one quick questions: I am using pandas.CS...
# questions
a
Hi Team, one quick questions: I am using pandas.CSVDataset to save a file however when I mention
sep
in save_args, it gives me an error:
Copy code
prm_customer:
  type: pandas.CSVDataSet
  filepath: ${base_path}/${folders.prm}/
  save_args:
    index: False
    sep: "|"
Any idea how to fix this? I am using
kedro 0.18.1
n
What error did you get?
a
Error tokenizing data. C error: Expected 1 fields in line 3, saw 2
d
that’s just a badly formatted csv
you need to provide an error handling strategy
on_bad_lines: _{'error', 'warn', 'skip'}_
a
We have a column with address info which has
,
, thats why I need a different seperator to save files
d
its not really a kedro issue it’s a pandas configuration issue on how to deal with a bad delimiter, refer to to
read_csv
or
to_csv
docs, all we’re doing is passing through the arguments there?
👍 1
a
to_csv
has a
sep
feild,
sep
is supported with
load_args
but for some reason when I use it with
save_args
it doesnt work
d
it’s not that arg that’s failing
you’re getting a tokenizer issue cos pandas and the underlying C code can’t write at least one of your lines
perhaps adding strict quoting will help
a
Let me try that, thanks
n
if you do
<http://df.to|df.to>_csv(index=False. sep="|")
, I expect it will throw the same error.
Are you sure this is being thrown when it’s writing data? or read\
d
or use a real format like parquet 😛
a
ya parquet will be the best thing
@Nok Lam Chan currently pandas offer
sep
in
to_csv
, but not sure if its available in the pandas version used with kedro 0.18.1
s
I 2nd Noks question: Is it reading or writing the dataset? Have encountered this error with read_csv only. And have you tried saving 'dirty' inside the node with regular to_csv?
👍 1