Hi everyone, Hope you’re all doing well :slightly...
# questions
m
Hi everyone, Hope you’re all doing well 🙂 I am stumbling on something that seems odd (to me). So… As you all already know
pd.read_csv()
has a
nrows
param that takes an
int
as argument but is set by default to
None
, in which case, the full dataset is loaded. Therefore, I thought that I could leverage that in order to give myself the ability to experiment / iterable quickly by downsampling my datasets at runtime with something like:
Copy code
"raw_{table}":
  type: pandas.CSVDataset
  filepath: data/01_raw/tables/raw_{table}.csv
  load_args:
    nrows: "${runtime_params:nrows, None}"
While this works perfectly with
kedro run --params nrows=100
, when leaving the param unspecified, I end up with
Copy code
DatasetError: Failed while loading data from data set
CSVDataset(filepath=/Users/marc/DODOBIRD/DODO_CODE/kedro-etl/data/01_raw/tables/raw_users.csv, load_args={'nrows': None}, protocol=file, save_args={'index': False}).
'nrows' must be an integer >=0
and yet running
pd.read_csv("data.csv", nrows=None)
simply returns the full dataset as expected. Is this a bug ? or am I missing something / doing something wrong. Thanks for your input, M.
Untitled
i
Could you put a breakpoint in the pandas validate_integer function to see what the "val" is arriving as? Because that definitely seems strange
a
Try -
Copy code
nrows: "${runtime_params:nrows, null}"
Since catalog is yaml,
null
should work instead of
None
which might be getting treated as a string
this 2
i
From the traceback it looks like it's passing None correctly though, right? But let's see if that fixes it 🙂
m
Thx to you both for your messages. I’m trying the
null
instead of
None
thing right now. 👍🏼
It worked !!! 🎊 Thanks @Ankita Katiyar But still… As @Iñigo Hidalgo pointed out… It is really strange and confusing that the traceback showed
None
… If I have time, I’ll put a breakpoint to inspect this… But right now, I have to “deliver” 😅 Thanks again. Good day to you both. M.
🥳 4
sadcat 1
i
to answer my own question, it seems strings get printed in the dataset repr
👍🏼 1
👍 1
m
Thanks for being more curious / thorough than me and digging deeper into this 😉
i
I was actually looking into something different and saw that the "nonsense" was being represented without the quotes and remembered this thread 😆