Hi everyone I have a setup in my pipeline py file where I ha Kedro #questions

Hi everyone, I have a setup in my pipeline.py file...

Harry Lawes

07/11/2024, 10:24 AM

Hi everyone, I have a setup in my pipeline.py file where I have a

Pipeline

(call it P1) and then a modular

pipeline

(P2) based off P1. The pipeline file returns both P1 and P2, so by default when running

kedro run

it runs both P1 and P2. I can make it just run P2 by including a namespace and running

kedro run pipeline --namespace p2

, but when doing the same with P1, it still runs both P1 and P2 (I guess because P2 inherits P1's namespace?). Is there a way with tags / namespaces to only run P1 but not P2? Ideally I want to be able to choose whether to run P1, P2 or both just with the

kedro run

command and not have to change the code at all. Thanks so much!

marrrcin

07/11/2024, 10:31 AM

I suggest you to read the documentation https://docs.kedro.org/en/stable/nodes_and_pipelines/pipeline_registry.html Having multiple pipelines is the most basic thing Kedro supports. All pipelines are modular.

Harry Lawes

07/11/2024, 4:14 PM

Hmm thanks but don't think that solves the issue - I have multiple pipelines yes but I'm unable to tell kedro to run just P1 and not P2 If I apply any tags or names to P1, when I try to run them (e.g.

kedro run -p pipeline_name --tags p1

), it runs P1 and P2, because clearly basing P2 off P1 (using code below) means the P1 tags/names are being applied to both P1 and P2

Copy code

p1 = Pipeline([nodes], tags="p1", namespace="p1")
p2 = pipeline(pipe=p1, tags="p2", namespace="p2")

marrrcin

07/11/2024, 4:45 PM

Read about

kedro run --pipeline=pipeline_name

Harry Lawes

07/11/2024, 9:16 PM

Sorry I should say if it's not clear that I am running the pipelines by name. P1 and P2 both live in the same pipeline.py file - in the pipeline registry that pipeline is called

perm

so the actual kedro run command I run is

kedro run -p perm --namespace p1

but this still runs P1 and P2 However if I run

kedro run -p perm --namespace p2

, it only runs P2. So I just want to be able to replicate that and run just P1

Nok Lam Chan

07/16/2024, 1:30 PM

P1 and P2 both live in the same pipeline.py file

Are they one pipeline or two pipelines? I think what @marrrcin is suggesting it that you can create them as separate pipeline object. A pipeline.py could have more than one pipeline. If you see

pipleine_registry.py

, all kedro pipeline are basically a dictionary, with the key as the name of a pipeline and the value as the actual pipeline object. So you may have something like

Copy code

pipelines["p2"] = p2
pipelines["p1"] = p1

Then you can just use

kedro run -p p1

. Tags/name/namespace are all tools to help you to slice pipeline. If it doesn't make sense to inherit the

p1

tag, you can consider create a base pipeline without tag, and then tag them separately. i.e.

Copy code

base_piepline = pipeline([nodes])
p1 = pipeline(base_pipeline, tags="p1")
p2 = pipeline(base_pipeline, tags="p2")

There could be multiple options, happy to discuss further to see which fits best

Harry Lawes

07/16/2024, 3:56 PM

They are separate pipeline, but p2 is based on p1 basically But I think your suggestion at the end there with base_pipeline would work best, I'll give that a go. Thank you v much!

👍🏼 1

7 Views

Open in Slack

Previous Next