In Kedro you can assign a pipeline to a namespace with the ` Kedro #questions

In Kedro you can assign a pipeline to a namespace ...

Marshall Krassenstein

08/16/2024, 4:34 PM

In Kedro you can assign a pipeline to a namespace with the

namespace

argument. https://docs.kedro.org/en/stable/nodes_and_pipelines/namespaces.html#what-is-a-namespace Is there a way to access the namespace attribute once you define the pipeline?

Laura Couto

08/18/2024, 5:50 PM

Hey Marshall, the Node objects within your pipelines should have a namespace attribute that you can access if you defined it previously. Does this help you?

Marshall Krassenstein

08/18/2024, 8:47 PM

Yes it does, thank you! Do you know if it's possible to also access the parameters passed to a pipeline? E.g. access the attributes in

Pipeline(pipe, inputs, outputs, *parameters*)

Laura Couto

08/18/2024, 9:45 PM

Could you give me an example of how you're trying to access them?

Nok Lam Chan

08/19/2024, 10:13 AM

What are you trying to do? If you print out the `Pipeline`object directly, it may have the information you needed.

Marshall Krassenstein

08/19/2024, 12:04 PM

I'm trying to access the parameters and namespace in

pipeline_registry.py

at runtime with the intention of modifying them based on the user

tag

argument The goal is to let the user choose between running two pipelines with mostly identical steps (the circle in the middle is an "XOR" block). The tricky part is the parameters files. DataPrep A has a

parameters.yml

file with the namespace

data_prep_a

. DataPrep B has a

parameters.yml

file with the namespace

data_prep_b

. Downstream, I have the Model subpipeline reference parameters via

params:data_prep.parameter_here

. Ideally at runtime, the parameters referenced by model would be modified based on the tag argument. So if I run

kedro run -t A

, the parameters of each node in the model pipeline should be intercepted and everywhere

params:data_prep.some_parameter_here

is referenced should be replaced by

params:data_prep_a.some_parameter_here

. Since parameters can be defined in pipelines in addition to nodes, these parameters also need to be intercepted at the subpipeline level. Does that make sense? I realize I would be modifying attributes that aren't supposed to be modified at runtime.

Nok Lam Chan

08/19/2024, 12:13 PM

I see. Instead of updating parameters at runtime, is it possible to have a separate node for Model /Deploy A,B? Your approach is overloading what

tag

is supposed to do (filtering instead of updating)

Nok Lam Chan

08/19/2024, 12:14 PM

If anything, I think this will be clearer and you don't need to update parameters. if tag_A: model_pipeline = model_pipeline_A elif tag_B: model_pipeline = model_pipeline_B

Marshall Krassenstein

08/19/2024, 12:18 PM

The problem is that each rectangle represents a subpipeline, not a node. I'd ideally not violate DRY in this way, especially because the chart I made is a simplified representation of my pipeline and other subpipelines also have dependencies on the parameters from

data_prep

. Fully acknowledge that I am not using

tag

in the way it is designed to be used 😅

Marshall Krassenstein

08/19/2024, 12:43 PM

Do you see any reason why overloading the tag argument wouldn't work though?

Marshall Krassenstein

08/19/2024, 12:51 PM

I'll say that if there was a way to temporarily remove a

catalog

parameters

file from consideration at runtime, I think that would also work

Nok Lam Chan

08/19/2024, 1:00 PM

overloading it should work as long as you are updating the correct parameters. I would still do this before the pipeline actually created rather than after pipeline is created.

Nok Lam Chan

08/19/2024, 1:01 PM

If I understand correctly, you want this

Copy code

my_pipeline = pipeline( some_node, inputs = f"{tag}.some_params_group", ...)

You don't really need to update the pipeline object, but rather getting the right namespace before you create the pipeline.

Marshall Krassenstein

08/19/2024, 1:42 PM

So in this case I would modify each subpipeline's

pipeline.py

file instead of

pipeline_registry.py

Marshall Krassenstein

08/19/2024, 1:48 PM

Hoping a large screenshot of current pipeline is not too annoying to show the current structure. If I run

kedro run -t from_file

, the namespace, deploy_application should change to

deploy_application_from_file

and the parameters referencing

params:deploy_forecast

should change to

params:deploy_forecast_from_file

Marshall Krassenstein

08/19/2024, 2:49 PM

Sorry to keep harping on this thread. If I could do this one thing before the pipeline executes I think it would solve all of my problems: Access one of the parameters files and change the top level namespace. So lets say I have a pipeline

model

that has two possible parameters files

model_a.yml

and

model_b.yml

, both of which have the namespace

model

If I run

kedro run -t A

, I would like the namespace of

model_b.yml

to change to something unused, such as

*x_*model_*x*

Nok Lam Chan

08/19/2024, 2:54 PM

So lets say I have a pipeline
model
that has two possible parameters files
model_a.yml
and
model_b.yml
, both of which have the namespace
model

If I run
kedro run -t A
, I would like the namespace of
model_b.yml
to change to something unused, such as
*x_*model_*x*

This sounds a bit strange, why would you change the namespace of a file instead of having a different namespace and select the correct namespace instead?

Marshall Krassenstein

08/19/2024, 3:12 PM

Because selecting the correct namespace involves changing the namespace of each

pipeline

file to the correct namespace as well as all of the pipeline parameters whereas changing only the namespace within a parameters file requires no changes to any of the pipelines

Marshall Krassenstein

08/19/2024, 7:04 PM

For my own sense of closure I figured out an approach. •

model_a.yml

and other

related parameters files get put in a folder

model_a

. •

model_b.yml

and other

related parameters files get put in a folder

model_b

. • In

globals.yml

or similar, specify which path between A or B to use. (parameters_source) • In

pipeline_registry.py

filter pipelines based on this file • In

settings.py

change the config patterns based on this file e.g.

Copy code

CONFIG_LOADER_ARGS = {
    "base_env": "base",
    "default_run_env": "local",
    "config_patterns": {
        "parameters": [
            "parameters*",
            "parameters*/**",
            f"{parameters_source}/parameters*",
        ],
    },
}

25 Views

Open in Slack

Previous Next