Team wondering if there is a way to control the node order e Kedro #questions

Team, wondering if there is a way to control the n...

sujdurai

03/21/2023, 2:39 AM

Team, wondering if there is a way to control the node order execution in kedro, or an option to wait before executing another node. Context: I have a

node

that is used in two pipelines. They use the same input tables, but I expect the

node

in the second pipeline to run only after my first pipeline, because, the input files for the

node

in the second pipeline will be updated as part of the first pipeline run. Because I have registered both the pipelines to run as default in the

registry

, the

node

from the second pipeline runs sooner than I expect - I don’t want that.

Copy code

# Pipeline A
Input X, Y --> node1 + node2 + node3 --> Output X (i.e Input X after update)

# Pipeline B 
Input X(after update from Pipeline A), Y --> node1 + node4 + node5. --> Output Z

Order of execution (node_Pipelinename)
node1_A
node1_B
node3_A
node2_A
node4_B
node5_B

Expected order of execution
node1_A
node3_A
node2_A
node1_B
node4_B
node5_B

marrrcin

03/21/2023, 7:49 AM

Use dummy outputs/inputs if you really need to do this

Antony Milne

03/21/2023, 9:58 AM

If Pipeline B takes as input X after it’s been output from Pipeline A then shouldn’t the correct structure be this? 🤔 Where the modified output X is a whole new dataset?

Copy code

# Pipeline A
Input X, Y --> node1 + node2 + node3 --> Output X2

# Pipeline B 
Input X2, Y --> node1 + node4 + node5. --> Output Z

👍🏼 1

Antony Milne

03/21/2023, 9:58 AM

This way kedro will automatically resolve the node running order exactly as you like.

sujdurai

03/21/2023, 12:52 PM

Thanks for the suggestion. I didn’t think of using it as a separate output, because the goal of it was to update the existing input directly. • I’ll try to create a separate output, or rather point it to a separate catalog entry, but have the same file path.

16 Views

Open in Slack

Previous Next