Hello, *Have a quick question related to pipelines...
# questions
m
Hello, Have a quick question related to pipelines and nodes Ordering, I have created like 5 pipelines each of which has its own nodes, now whenever i am running the full env.
kedro run --env env_name
the pipelines nodes are interchangeable in running order , meaning that it runs as below
pipeline 1 --> Node 1
pipeline 2 ---> Node 1
pipeline 2 --> Node 2
pipeline 3 --> Node 1
pipeline 1 --> Node 2
pipeline 3 --> Node 2
(Note Nodes order in each pipeline is correct but kedro run a node from each pipeline) However i want them to run in the below order,
pipeline 1 --> Node 1
pipeline 1---> Node 2
pipeline 2 --> Node 1
pipeline 2 --> Node 2
pipeline 3 --> Node 1
pipeline 3 --> Node 2
I have the following config in pipeline_registry -->
return {"__default__": pipeline1 + pipeline2+ pipeline3 + pipeline4 + pipeline5, }
K 2
👀 1
d
Kedro doesn't guarantee node order beyond what you specify through nodes depending on other nodes (that create the DAG). In 0.18.2, there was a change to make node order the same across
SequentialRunner
runs, but this still wouldn't guarantee anything like this.
m
so how to handle a situation like this ?,, for example i am doing
1st pipeline for data loading and preprocessing
,
the 2nd one for training and evaluation
,
the 3rd for deployment
,
the 4th for mlops
, now its trying to run a node in deployment which tries to deploy a model that didn't actually get trained in pipeline no2. and same applies for most of them.
d
If there are dependencies and you want to run in Kedro, define them. For example, if
deployment
tries to deploy a model from
training
, then the
training
pipeline should write to a
model_output
dataset and
deployment
should pick up that
model_output
dataset, so your dependencies will be defined in your DAG. If they're separate processes, use an orchestrator to schedule your pipelines, or run them in order yourself (e.g.
kedro run --pipeline pipe1 && kedro run --pipeline pipe2 && ...
).