Do you use `kedro run --runner ParalleRunner` to s...
# user-research
n
Do you use
kedro run --runner ParalleRunner
to speed up your pipeline. If not, why? (Other than Spark doesn't work with multiprocess)
m
Because libraries like xgboost, scikit learn + joblib, polars, … already use parallel processing… I do use the async loading of catalog entries often!
👍 2
👀 2
m
Because multiprocessing is 💩
😂 3
p
Because when working with a typical single GPU machine you don't want various nodes to try to access the GPU at the same time, which leads to CUDA crashes. This is specific to GPU-heavy workflows though.
n
@Piotr Grabowski good point about GPU and there is no finer control on which nodes should access the GPU @Matthias Roels good point about async, and libraries are handling this themselves already. Tho I remember pandas was pretty bad at using all your cores.
it sounds like most people don't really need
ParallelRunner
and rely on the libraries itself (more flexible I guess). My feeling is `async`/`CacheDataset`/`kedro-accelerator` are more likely to bring performance gain. Cc @Deepyaman Datta
👍 2
j
For partitioned datasets I often use async file loading from catalog too, which works very well
💯 1