https://kedro.org/ logo
#questions
Title
# questions
r

Richard Purvis

02/27/2024, 7:36 PM
Hello, is it possible to save data from a pandas iterator to a partitioned dataset in chunks? For example, reading from
pd.read_csv
with a
chunksize
arg. I have seen the lazy save for a partitioned dataset article (link). However this requires a pre-defined dictionary with callable items, and if you are iterating through chunks you wouldn't be able to predefine keys. CC @Yury Fedotov
d

datajoely

02/28/2024, 2:16 AM
Can you not lazily create dictionary of callable to do this where your keys are just enumerated?
j

Juan Luis

02/28/2024, 6:45 AM
hi @Richard Purvis, I think this is similar to @Biel Stelaā€™s request here https://kedro-org.slack.com/archives/C03RKP2LW64/p1706716264070519
just so that I understand, would your node function be `return`ing the individual chunks?
yield
them? something else?
r

Richard Purvis

02/28/2024, 1:23 PM
@Juan Luis It would be yielding them. @datajoely I'm not sure what you mean by enumerate, as in the python
enumerate()
function?
d

datajoely

02/28/2024, 1:24 PM
yeah just so you can have a lazily defined set of keys for each chunk
j

Juan Luis

02/28/2024, 1:39 PM
we have an example in the docs with a generator node using
yield
and a custom dataset, please have a look https://docs.kedro.org/en/stable/nodes_and_pipelines/nodes.html#saving-data-with-generators if this isn't quite it, let's continue the conversation šŸ™‚
šŸ‘ 1
r

Richard Purvis

02/29/2024, 5:06 PM
@Juan Luis This appears to be exactly what I need, thank you!
j

Juan Luis

02/29/2024, 5:13 PM
amazing šŸ™ŒšŸ¼