Hey, I'm trying to better understand when it it's ...
# questions
y
Hey, I'm trying to better understand when it it's reasonable / feasible to use
ParallelRunner
instead of the default
SequentialRunner
. Are those conclusions correct? 1. Worst case scenario,
ParallelRunner
would just yield same speed as
SequentialRunner
. It can't produce different results and manages the execution order in a way that if some node expects outputs from a few nodes, it would wait until them all get generated. 2.
ParallelRunner
shines when a pipeline does many similar operations on some already-available input, and it's just a matter of compute time to do each of those operations. In other words, those operations do not sequentially depend on each other. Likely, a pipeline consisting of a few namespace pipelines is a good candidate for that runner. And a question: 3. When would you avoid using
ParallelRunner
?
d
I think technically the worst case scenario of ParallelRunner is ever so slightly worse than Sequential since there is an overhead pooling, splitting and reconciling the processes - but it shouldn’t be noticible
👍 1
On #3 in distributed execution contexts like Spark, Snowpark or Dask you should use ThreadRunner instead
👍 2