Hi everyone Hope you re all doing well slightly smiling face Kedro #questions

Hi everyone, Hope you’re all doing well :slightly...

Marc Gris

11/21/2023, 10:28 AM

Hi everyone, Hope you’re all doing well 🙂 I am stumbling on something that seems odd (to me). So… As you all already know

pd.read_csv()

has a

nrows

param that takes an

int

as argument but is set by default to

None

, in which case, the full dataset is loaded. Therefore, I thought that I could leverage that in order to give myself the ability to experiment / iterable quickly by downsampling my datasets at runtime with something like:

Copy code

"raw_{table}":
  type: pandas.CSVDataset
  filepath: data/01_raw/tables/raw_{table}.csv
  load_args:
    nrows: "${runtime_params:nrows, None}"

While this works perfectly with

kedro run --params nrows=100

, when leaving the param unspecified, I end up with

Copy code

DatasetError: Failed while loading data from data set
CSVDataset(filepath=/Users/marc/DODOBIRD/DODO_CODE/kedro-etl/data/01_raw/tables/raw_users.csv, load_args={'nrows': None}, protocol=file, save_args={'index': False}).
'nrows' must be an integer >=0

and yet running

pd.read_csv("data.csv", nrows=None)

simply returns the full dataset as expected. Is this a bug ? or am I missing something / doing something wrong. Thanks for your input, M.

Marc Gris

11/21/2023, 10:31 AM

Untitled

Iñigo Hidalgo

11/21/2023, 10:34 AM

Could you put a breakpoint in the pandas validate_integer function to see what the "val" is arriving as? Because that definitely seems strange

Ankita Katiyar

11/21/2023, 10:36 AM

Try -

Copy code

nrows: "${runtime_params:nrows, null}"

Ankita Katiyar

11/21/2023, 10:39 AM

Since catalog is yaml,

null

should work instead of

None

which might be getting treated as a string

this 2

Iñigo Hidalgo

11/21/2023, 10:41 AM

From the traceback it looks like it's passing None correctly though, right? But let's see if that fixes it 🙂

Marc Gris

11/21/2023, 10:44 AM

Thx to you both for your messages. I’m trying the

null

instead of

None

thing right now. 👍🏼

Marc Gris

11/21/2023, 10:52 AM

It worked !!! 🎊 Thanks @Ankita Katiyar But still… As @Iñigo Hidalgo pointed out… It is really strange and confusing that the traceback showed

None

… If I have time, I’ll put a breakpoint to inspect this… But right now, I have to “deliver” 😅 Thanks again. Good day to you both. M.

🥳 4

sadcat 1

Iñigo Hidalgo

11/23/2023, 11:55 PM

to answer my own question, it seems strings get printed in the dataset repr

👍🏼 1

👍 1

Marc Gris

11/24/2023, 6:59 AM

Thanks for being more curious / thorough than me and digging deeper into this 😉

Iñigo Hidalgo

11/24/2023, 9:49 AM

I was actually looking into something different and saw that the "nonsense" was being represented without the quotes and remembered this thread 😆

Open in Slack

Previous Next