Hi kedro team ! Is it possible to run with asynchr...
# questions
a
Hi kedro team ! Is it possible to run with asynchrone and/or with multithread the save operation in lazy saving of a partitioned dataset ?
d
Does this work?
kedro run --async
Load and save node inputs and outputs asynchronously with threads
a
No, from what i read and test, it's only to load/write multiple dataset asynchronuously
Not for each partition of a dataset
d
Oh yes gotcha
I think the lazy saving is just a for loop
you need to subclass the partitioned dataset class to add that
a
Mmmh okok, but when i return the dict it's processed in the partition class or in the hook after_node_run ?
I think i read on the doc than there is a trigger on the hook
d
partition class
a
Ok thanks !
Maybe I'll make a PR to include it in the plugin
it’s a for loop
so you could maybe (1) chunk it and use multi-processing (2) try something like joblib
Just know if you use
ParallelRunner
it may not work nicely
a
Oh yes because all thread will already be alocated ? Right ?
d
more that Python and concurrency is just painful
a
Ok thanks !