Hi Team one quick questions I am using pandas CSVDataset to Kedro #questions

Hi Team, one quick questions: I am using pandas.CS...

Ankar Yadav

02/06/2023, 12:19 PM

Hi Team, one quick questions: I am using pandas.CSVDataset to save a file however when I mention

sep

in save_args, it gives me an error:

Copy code

prm_customer:
  type: pandas.CSVDataSet
  filepath: ${base_path}/${folders.prm}/
  save_args:
    index: False
    sep: "|"

Any idea how to fix this? I am using

kedro 0.18.1

Nok Lam Chan

02/06/2023, 12:28 PM

What error did you get?

Ankar Yadav

02/06/2023, 12:45 PM

Error tokenizing data. C error: Expected 1 fields in line 3, saw 2

datajoely

02/06/2023, 1:04 PM

that’s just a badly formatted csv

datajoely

02/06/2023, 1:04 PM

you need to provide an error handling strategy

datajoely

02/06/2023, 1:05 PM

on_bad_lines: _{'error', 'warn', 'skip'}_

Ankar Yadav

02/06/2023, 1:16 PM

We have a column with address info which has

, thats why I need a different seperator to save files

datajoely

02/06/2023, 1:17 PM

its not really a kedro issue it’s a pandas configuration issue on how to deal with a bad delimiter, refer to to

read_csv

to_csv

docs, all we’re doing is passing through the arguments there?

👍 1

Ankar Yadav

02/06/2023, 1:20 PM

to_csv

has a

sep

feild,

sep

is supported with

load_args

but for some reason when I use it with

save_args

it doesnt work

datajoely

02/06/2023, 1:20 PM

it’s not that arg that’s failing

datajoely

02/06/2023, 1:20 PM

you’re getting a tokenizer issue cos pandas and the underlying C code can’t write at least one of your lines

datajoely

02/06/2023, 1:21 PM

perhaps adding strict quoting will help

Ankar Yadav

02/06/2023, 1:21 PM

Let me try that, thanks

Nok Lam Chan

02/06/2023, 2:11 PM

if you do

<http://df.to|df.to>_csv(index=False. sep="|")

, I expect it will throw the same error.

Nok Lam Chan

02/06/2023, 2:11 PM

Are you sure this is being thrown when it’s writing data? or read\

datajoely

02/06/2023, 2:12 PM

or use a real format like parquet 😛

Ankar Yadav

02/06/2023, 2:13 PM

ya parquet will be the best thing

Ankar Yadav

02/06/2023, 2:14 PM

@Nok Lam Chan currently pandas offer

sep

to_csv

, but not sure if its available in the pandas version used with kedro 0.18.1

Sebastian Pehle

02/06/2023, 5:13 PM

I 2nd Noks question: Is it reading or writing the dataset? Have encountered this error with read_csv only. And have you tried saving 'dirty' inside the node with regular to_csv?

👍 1

6 Views

Open in Slack

Previous Next