Robert Lugg
08/08/2024, 11:21 PMDeepyaman Datta
08/08/2024, 11:27 PMThreadRunner
or ParallelRunner
implementation; they're all rather similar).
I don't have any familiarity with these grid systems, but it seems you can even run Dask Distributed code on them, so using DaskRunner
on top might also be a possibility.Robert Lugg
08/09/2024, 1:00 AMRobert Lugg
08/09/2024, 1:22 AMLaurens Vijnck
08/09/2024, 7:47 AMDeepyaman Datta
08/09/2024, 1:11 PMThat runner then loops through pipeline.nodes and then does "something". What is that something? Would it be starting a process on a remote machine for each node?It depends on the runner implementation, but often this amounts to submitting a task in the distributed environment.
Presumably a pipeline has dependent nodes and isn't a "straight line" from start to finish. Given that why does the running do a simple loop through the nodes?I can try to answer better later today, but if you look at most runner code (e.g. see https://docs.kedro.org/en/stable/_modules/kedro/runner/thread_runner.html#ThreadRunner), you'll find that it processes a list of nodes with dependencies, and chooses the next node from a set of "ready" nodes.
Robert Lugg
08/09/2024, 2:57 PMRobert Lugg
08/09/2024, 3:01 PMDeepyaman Datta
08/09/2024, 4:15 PMpipeline.nodes
is toposorted, which is why you can just iterate through itNok Lam Chan
08/09/2024, 9:47 PM