I have a question about passing objects between th...
# questions
p
I have a question about passing objects between the different nodes in a pipeline in Kedro. We are trying to use Kedro in a graph machine learning project so we're dealing with some custom classes holding information (on triples, nodes, some metadata). At some point we are creating our Graph class instances which hold the data. Now passing a Graph between the nodes is not a problem and it works. However, we have a node which creates multiple versions of Graph instances (for cross-validation purposes). To avoid overwhelming the RAM we went for a Generator object which we wanted to pass from that node to a "cross-validation node", so that this next node could
next()
what it needs. Now the problem is that when we pass that generator between the nodes, Kedro seems to be internally running `next()`on all the elements of that generator and loads everything into RAM, while spending a lot of time on it, like this:
[10/04/23 15:51:26] INFO     Saving data to 'configured_graph_generator' (MemoryDataset)...                                              data_catalog.py:531
[10/04/23 15:53:58] INFO     Saving data to 'configured_graph_generator' (MemoryDataset)...                                              data_catalog.py:531
[10/04/23 15:56:28] INFO     Saving data to 'configured_graph_generator' (MemoryDataset)...                                              data_catalog.py:531
[10/04/23 15:58:58] INFO     Saving data to 'configured_graph_generator' (MemoryDataset)...                                              data_catalog.py:531
[10/04/23 16:01:28] INFO     Saving data to 'configured_graph_generator' (MemoryDataset)...                                              data_catalog.py:531
[10/04/23 16:03:57] INFO     Saving data to 'configured_graph_generator' (MemoryDataset)...                                              data_catalog.py:531
[10/04/23 16:06:26] INFO     Saving data to 'configured_graph_generator' (MemoryDataset)...                                              data_catalog.py:531
[10/04/23 16:08:56] INFO     Saving data to 'configured_graph_generator' (MemoryDataset)...                                              data_catalog.py:531
Is there a way to avoid this? Is this a bug or a consequence if misusing Kedro? 🙂
👀 1
n
Kedro does support returning generator for lazy saving, it try to iterate on it and thus you see multiple logs there. Not sure is this a bug or a conflict with the design. https://docs.kedro.org/en/stable/nodes_and_pipelines/nodes.html#how-to-use-generator-functions-in-a-node
👍 3