Harry Lawes
07/11/2024, 10:24 AMPipeline
(call it P1) and then a modular pipeline
(P2) based off P1. The pipeline file returns both P1 and P2, so by default when running kedro run
it runs both P1 and P2.
I can make it just run P2 by including a namespace and running kedro run pipeline --namespace p2
, but when doing the same with P1, it still runs both P1 and P2 (I guess because P2 inherits P1's namespace?). Is there a way with tags / namespaces to only run P1 but not P2?
Ideally I want to be able to choose whether to run P1, P2 or both just with the kedro run
command and not have to change the code at all. Thanks so much!marrrcin
07/11/2024, 10:31 AMHarry Lawes
07/11/2024, 4:14 PMkedro run -p pipeline_name --tags p1
), it runs P1 and P2, because clearly basing P2 off P1 (using code below) means the P1 tags/names are being applied to both P1 and P2
p1 = Pipeline([nodes], tags="p1", namespace="p1")
p2 = pipeline(pipe=p1, tags="p2", namespace="p2")
marrrcin
07/11/2024, 4:45 PMkedro run --pipeline=pipeline_name
Harry Lawes
07/11/2024, 9:16 PMperm
so the actual kedro run command I run is kedro run -p perm --namespace p1
but this still runs P1 and P2
However if I run kedro run -p perm --namespace p2
, it only runs P2. So I just want to be able to replicate that and run just P1Nok Lam Chan
07/16/2024, 1:30 PMP1 and P2 both live in the same pipeline.py fileAre they one pipeline or two pipelines? I think what @marrrcin is suggesting it that you can create them as separate pipeline object. A pipeline.py could have more than one pipeline. If you see
pipleine_registry.py
, all kedro pipeline are basically a dictionary, with the key as the name of a pipeline and the value as the actual pipeline object.
So you may have something like
pipelines["p2"] = p2
pipelines["p1"] = p1
Then you can just use kedro run -p p1
.
Tags/name/namespace are all tools to help you to slice pipeline. If it doesn't make sense to inherit the p1
tag, you can consider create a base pipeline without tag, and then tag them separately. i.e.
base_piepline = pipeline([nodes])
p1 = pipeline(base_pipeline, tags="p1")
p2 = pipeline(base_pipeline, tags="p2")
There could be multiple options, happy to discuss further to see which fits bestHarry Lawes
07/16/2024, 3:56 PM