Hello, is it possible to save data from a pandas i...
# questions
r
Hello, is it possible to save data from a pandas iterator to a partitioned dataset in chunks? For example, reading from
pd.read_csv
with a
chunksize
arg. I have seen the lazy save for a partitioned dataset article (link). However this requires a pre-defined dictionary with callable items, and if you are iterating through chunks you wouldn't be able to predefine keys. CC @Yury Fedotov
d
Can you not lazily create dictionary of callable to do this where your keys are just enumerated?
j
hi @Richard Purvis, I think this is similar to @Biel Stelaā€™s request here https://kedro-org.slack.com/archives/C03RKP2LW64/p1706716264070519
just so that I understand, would your node function be `return`ing the individual chunks?
yield
them? something else?
r
@Juan Luis It would be yielding them. @datajoely I'm not sure what you mean by enumerate, as in the python
enumerate()
function?
d
yeah just so you can have a lazily defined set of keys for each chunk
j
we have an example in the docs with a generator node using
yield
and a custom dataset, please have a look https://docs.kedro.org/en/stable/nodes_and_pipelines/nodes.html#saving-data-with-generators if this isn't quite it, let's continue the conversation šŸ™‚
šŸ‘ 1
r
@Juan Luis This appears to be exactly what I need, thank you!
j
amazing šŸ™ŒšŸ¼