https://kedro.org/ logo
#questions
Title
# questions
a

Adrien

11/23/2023, 6:18 PM
Hi kedro team ! Is it possible to run with asynchrone and/or with multithread the save operation in lazy saving of a partitioned dataset ?
d

datajoely

11/23/2023, 6:19 PM
Does this work?
kedro run --async
Load and save node inputs and outputs asynchronously with threads
a

Adrien

11/23/2023, 6:20 PM
No, from what i read and test, it's only to load/write multiple dataset asynchronuously
Not for each partition of a dataset
d

datajoely

11/23/2023, 6:20 PM
Oh yes gotcha
I think the lazy saving is just a for loop
you need to subclass the partitioned dataset class to add that
a

Adrien

11/23/2023, 6:21 PM
Mmmh okok, but when i return the dict it's processed in the partition class or in the hook after_node_run ?
I think i read on the doc than there is a trigger on the hook
d

datajoely

11/23/2023, 6:23 PM
partition class
a

Adrien

11/23/2023, 6:23 PM
Ok thanks !
Maybe I'll make a PR to include it in the plugin
it’s a for loop
so you could maybe (1) chunk it and use multi-processing (2) try something like joblib
Just know if you use
ParallelRunner
it may not work nicely
a

Adrien

11/23/2023, 6:25 PM
Oh yes because all thread will already be alocated ? Right ?
d

datajoely

11/23/2023, 6:25 PM
more that Python and concurrency is just painful
a

Adrien

11/23/2023, 6:26 PM
Ok thanks !