Question regarding saving data which is too big fo...
# questions
Question regarding saving data which is too big for RAM. I have a use case where a Kedro pipeline creates an RDF file - for the sake of this question, RDF file is simply a text file. It is used to load bulks of data to graph databases. My go-to was to use
for this purpose, but this means that my node should return a string with the entire contents of this RDF file.. Because the file is going to be so big, I want to write it in batches, so create 10% of the contents, then write them, then create another 10% and write them and so on until getting to 100%. Is there a way in Kedro to achieve something like that? I looked at
but it seems it has nothing to do with this use case
You can use
, and lazily save each partition.
But I need them all in the same file
the tool that loads them into the graph expects a single file
I would subclass
and get it to write chunks using a generator
@datajoely interesting. so
would accept a generator instead of a string?
@datajoely but in this example it's not accepting a generator - it accepts the actual data structure which is a dataframe So my node should be a generator, but
still should accept
and kedro will unpack if I understand correctly
Yes, Kedro unpacks the generator and calls
for each item - your dataset implementation can just append to a single file. Example:
@marrrcin we meet again 😄 (we worked together in kedro-azure we donated the datasets for azure) Thanks, I'll try it out
windows 1
😎 2