Team, wondering if there is a way to control the n...
# questions
s
Team, wondering if there is a way to control the node order execution in kedro, or an option to wait before executing another node. Context: I have a
node
that is used in two pipelines. They use the same input tables, but I expect the
node
in the second pipeline to run only after my first pipeline, because, the input files for the
node
in the second pipeline will be updated as part of the first pipeline run. Because I have registered both the pipelines to run as default in the
registry
, the
node
from the second pipeline runs sooner than I expect - I don’t want that.
Copy code
# Pipeline A
Input X, Y --> node1 + node2 + node3 --> Output X (i.e Input X after update)

# Pipeline B 
Input X(after update from Pipeline A), Y --> node1 + node4 + node5. --> Output Z

Order of execution (node_Pipelinename)
node1_A
node1_B
node3_A
node2_A
node4_B
node5_B

Expected order of execution
node1_A
node3_A
node2_A
node1_B
node4_B
node5_B
m
Use dummy outputs/inputs if you really need to do this
a
If Pipeline B takes as input X after it’s been output from Pipeline A then shouldn’t the correct structure be this? 🤔 Where the modified output X is a whole new dataset?
Copy code
# Pipeline A
Input X, Y --> node1 + node2 + node3 --> Output X2

# Pipeline B 
Input X2, Y --> node1 + node4 + node5. --> Output Z
👍🏼 1
This way kedro will automatically resolve the node running order exactly as you like.
s
Thanks for the suggestion. I didn’t think of using it as a separate output, because the goal of it was to update the existing input directly. • I’ll try to create a separate output, or rather point it to a separate catalog entry, but have the same file path.