RE: node execution order (sorry for beating a dead...
# questions
i
RE: node execution order (sorry for beating a dead horse) Are there any dependable rules regarding node execution order? Or is it always wholly non-deterministic? Say I have one pipeline with lots of nodes and interdependencies, but then I add a single node with unrelated inputs and outputs, so it is completely disconnected from the main pipeline. Would you expect a consistent behavior from this node, such as always executing at the start? Or is it as non-deterministic as the rest of the node execution order?
d
which version of Kedro ? I think in later versions than you use it’s now deterministic
i
For new projects which I'm working on we're using 0.18.14
d
I think it’s a later version, probably 0.19.x guarantees this
i
thanks, anywhere I can refer to to understand the new resolution order?
d
I have a feeling we did this before with the old library, but this PR uses the
graphlib
which was added to the python std lib after Kedro was initially released https://github.com/kedro-org/kedro/pull/3728/files
i
Nice, thanks for the concrete link. I'll have a look at graphlib toposort
n
https://github.com/kedro-org/kedro/pull/1604 This was the PR fix the order, probably 0.18.2/0.18.3
For same set of nodes, it resolves in the same order. There no guarantee adding a silo node will execute first or not, I think it's partially alphabetical when there is a tie.
👍 1
d
thanks Nok I couldn’t find that
i
Very in depth PR description, TIL a few things about Kedro internals, thank you 🙂
no guarantee adding a silo node will execute first or not
would you expect a node with no input dependencies to run before every node with dependencies, or not that either? i feel like i just read that in either your or Joel's links
n
would you expect a node with no input dependencies to run before every node with dependencies, or not that either? i feel like i just read that in either your or Joel's links
probably yes, at least in the old
toposort
(in 0.19.3 it switches to graphlib so I am less familiar with). The first step of toposort is sort nodes into nodes group, and nodes without input dependencies will be executed first.
I guess this expectation was removed in this later PR?
I am not sure, is there something characteristics that you need to rely on? I will probably check
group_nodes
instead
i
I don't absolutely need to, but if were documented behavior it would remove the need for some dummy I/O to force some nodes to execute at the very start. I will just add the I/Os anyways as that is clearly the public way of declaring this. I was trying to avoid this bc in my current pipeline it will be very unergonomic to add a dummy input to the main pipeline, as it is a modular pipeline imported from another library which I'd rather not modify. no big deal though. Thanks for your help, have a good weekend :)
n
I would not recommend to rely on that behavior. https://github.com/kedro-org/kedro/discussions/3758 I started a discussion to properly add a public API to support this.
👍 2