RE node execution order sorry for beating a dead horse Are t Kedro #questions

RE: node execution order (sorry for beating a dead...

Iñigo Hidalgo

05/24/2024, 2:08 PM

RE: node execution order (sorry for beating a dead horse) Are there any dependable rules regarding node execution order? Or is it always wholly non-deterministic? Say I have one pipeline with lots of nodes and interdependencies, but then I add a single node with unrelated inputs and outputs, so it is completely disconnected from the main pipeline. Would you expect a consistent behavior from this node, such as always executing at the start? Or is it as non-deterministic as the rest of the node execution order?

datajoely

05/24/2024, 2:08 PM

which version of Kedro ? I think in later versions than you use it’s now deterministic

Iñigo Hidalgo

05/24/2024, 2:08 PM

For new projects which I'm working on we're using 0.18.14

datajoely

05/24/2024, 2:31 PM

I think it’s a later version, probably 0.19.x guarantees this

Iñigo Hidalgo

05/24/2024, 2:32 PM

thanks, anywhere I can refer to to understand the new resolution order?

datajoely

05/24/2024, 2:35 PM

I have a feeling we did this before with the old library, but this PR uses the

graphlib

which was added to the python std lib after Kedro was initially released https://github.com/kedro-org/kedro/pull/3728/files

Iñigo Hidalgo

05/24/2024, 2:38 PM

Nice, thanks for the concrete link. I'll have a look at graphlib toposort

Nok Lam Chan

05/24/2024, 2:41 PM

https://github.com/kedro-org/kedro/pull/1604 This was the PR fix the order, probably 0.18.2/0.18.3

Nok Lam Chan

05/24/2024, 2:43 PM

For same set of nodes, it resolves in the same order. There no guarantee adding a silo node will execute first or not, I think it's partially alphabetical when there is a tie.

👍 1

datajoely

05/24/2024, 2:44 PM

thanks Nok I couldn’t find that

Iñigo Hidalgo

05/24/2024, 2:44 PM

Very in depth PR description, TIL a few things about Kedro internals, thank you 🙂

Iñigo Hidalgo

05/24/2024, 2:48 PM

no guarantee adding a silo node will execute first or not

would you expect a node with no input dependencies to run before every node with dependencies, or not that either? i feel like i just read that in either your or Joel's links

Iñigo Hidalgo

05/24/2024, 2:50 PM

from https://github.com/kedro-org/kedro/pull/3728/files#diff-07c8d8697b22c985a5f73c94938c2719064e37db82ab95357cf0831dd62367fb I guess this expectation was removed in this later PR?

👀 1

Nok Lam Chan

05/24/2024, 2:51 PM

would you expect a node with no input dependencies to run before every node with dependencies, or not that either? i feel like i just read that in either your or Joel's links

probably yes, at least in the old

toposort

(in 0.19.3 it switches to graphlib so I am less familiar with). The first step of toposort is sort nodes into nodes group, and nodes without input dependencies will be executed first.

Nok Lam Chan

05/24/2024, 2:54 PM

I guess this expectation was removed in this later PR?

I am not sure, is there something characteristics that you need to rely on? I will probably check

group_nodes

instead

Iñigo Hidalgo

05/24/2024, 2:58 PM

I don't absolutely need to, but if were documented behavior it would remove the need for some dummy I/O to force some nodes to execute at the very start. I will just add the I/Os anyways as that is clearly the public way of declaring this. I was trying to avoid this bc in my current pipeline it will be very unergonomic to add a dummy input to the main pipeline, as it is a modular pipeline imported from another library which I'd rather not modify. no big deal though. Thanks for your help, have a good weekend :)

Nok Lam Chan

05/24/2024, 3:02 PM

I would not recommend to rely on that behavior. https://github.com/kedro-org/kedro/discussions/3758 I started a discussion to properly add a public API to support this.

👍 2

2 Views

Open in Slack

Previous Next