Hi everyone, I have a setup in my pipeline.py file...
# questions
h
Hi everyone, I have a setup in my pipeline.py file where I have a
Pipeline
(call it P1) and then a modular
pipeline
(P2) based off P1. The pipeline file returns both P1 and P2, so by default when running
kedro run
it runs both P1 and P2. I can make it just run P2 by including a namespace and running
kedro run pipeline --namespace p2
, but when doing the same with P1, it still runs both P1 and P2 (I guess because P2 inherits P1's namespace?). Is there a way with tags / namespaces to only run P1 but not P2? Ideally I want to be able to choose whether to run P1, P2 or both just with the
kedro run
command and not have to change the code at all. Thanks so much!
m
I suggest you to read the documentation https://docs.kedro.org/en/stable/nodes_and_pipelines/pipeline_registry.html Having multiple pipelines is the most basic thing Kedro supports. All pipelines are modular.
h
Hmm thanks but don't think that solves the issue - I have multiple pipelines yes but I'm unable to tell kedro to run just P1 and not P2 If I apply any tags or names to P1, when I try to run them (e.g.
kedro run -p pipeline_name --tags p1
), it runs P1 and P2, because clearly basing P2 off P1 (using code below) means the P1 tags/names are being applied to both P1 and P2
Copy code
p1 = Pipeline([nodes], tags="p1", namespace="p1")
p2 = pipeline(pipe=p1, tags="p2", namespace="p2")
m
Read about
kedro run --pipeline=pipeline_name
h
Sorry I should say if it's not clear that I am running the pipelines by name. P1 and P2 both live in the same pipeline.py file - in the pipeline registry that pipeline is called
perm
so the actual kedro run command I run is
kedro run -p perm --namespace p1
but this still runs P1 and P2 However if I run
kedro run -p perm --namespace p2
, it only runs P2. So I just want to be able to replicate that and run just P1
n
P1 and P2 both live in the same pipeline.py file
Are they one pipeline or two pipelines? I think what @marrrcin is suggesting it that you can create them as separate pipeline object. A pipeline.py could have more than one pipeline. If you see
pipleine_registry.py
, all kedro pipeline are basically a dictionary, with the key as the name of a pipeline and the value as the actual pipeline object. So you may have something like
Copy code
pipelines["p2"] = p2
pipelines["p1"] = p1
Then you can just use
kedro run -p p1
. Tags/name/namespace are all tools to help you to slice pipeline. If it doesn't make sense to inherit the
p1
tag, you can consider create a base pipeline without tag, and then tag them separately. i.e.
Copy code
base_piepline = pipeline([nodes])
p1 = pipeline(base_pipeline, tags="p1")
p2 = pipeline(base_pipeline, tags="p2")
There could be multiple options, happy to discuss further to see which fits best
h
They are separate pipeline, but p2 is based on p1 basically But I think your suggestion at the end there with base_pipeline would work best, I'll give that a go. Thank you v much!
👍🏼 1