In Kedro you can assign a pipeline to a namespace ...
# questions
m
In Kedro you can assign a pipeline to a namespace with the
namespace
argument. https://docs.kedro.org/en/stable/nodes_and_pipelines/namespaces.html#what-is-a-namespace Is there a way to access the namespace attribute once you define the pipeline?
l
Hey Marshall, the Node objects within your pipelines should have a namespace attribute that you can access if you defined it previously. Does this help you?
m
Yes it does, thank you! Do you know if it's possible to also access the parameters passed to a pipeline? E.g. access the attributes in
Pipeline(pipe, inputs, outputs, *parameters*)
?
l
Could you give me an example of how you're trying to access them?
n
What are you trying to do? If you print out the `Pipeline`object directly, it may have the information you needed.
m
I'm trying to access the parameters and namespace in
pipeline_registry.py
at runtime with the intention of modifying them based on the user
tag
argument The goal is to let the user choose between running two pipelines with mostly identical steps (the circle in the middle is an "XOR" block). The tricky part is the parameters files. DataPrep A has a
parameters.yml
file with the namespace
data_prep_a
. DataPrep B has a
parameters.yml
file with the namespace
data_prep_b
. Downstream, I have the Model subpipeline reference parameters via
params:data_prep.parameter_here
. Ideally at runtime, the parameters referenced by model would be modified based on the tag argument. So if I run
kedro run -t A
, the parameters of each node in the model pipeline should be intercepted and everywhere
params:data_prep.some_parameter_here
is referenced should be replaced by
params:data_prep_a.some_parameter_here
. Since parameters can be defined in pipelines in addition to nodes, these parameters also need to be intercepted at the subpipeline level. Does that make sense? I realize I would be modifying attributes that aren't supposed to be modified at runtime.
n
I see. Instead of updating parameters at runtime, is it possible to have a separate node for Model /Deploy A,B? Your approach is overloading what
tag
is supposed to do (filtering instead of updating)
If anything, I think this will be clearer and you don't need to update parameters. if tag_A: model_pipeline = model_pipeline_A elif tag_B: model_pipeline = model_pipeline_B
m
The problem is that each rectangle represents a subpipeline, not a node. I'd ideally not violate DRY in this way, especially because the chart I made is a simplified representation of my pipeline and other subpipelines also have dependencies on the parameters from
data_prep
. Fully acknowledge that I am not using
tag
in the way it is designed to be used 😅
Do you see any reason why overloading the tag argument wouldn't work though?
I'll say that if there was a way to temporarily remove a
catalog
/
parameters
file from consideration at runtime, I think that would also work
n
overloading it should work as long as you are updating the correct parameters. I would still do this before the pipeline actually created rather than after pipeline is created.
If I understand correctly, you want this
Copy code
my_pipeline = pipeline( some_node, inputs = f"{tag}.some_params_group", ...)
You don't really need to update the pipeline object, but rather getting the right namespace before you create the pipeline.
m
So in this case I would modify each subpipeline's
pipeline.py
file instead of
pipeline_registry.py
?
Hoping a large screenshot of current pipeline is not too annoying to show the current structure. If I run
kedro run -t from_file
, the namespace, deploy_application should change to
deploy_application_from_file
and the parameters referencing
params:deploy_forecast
should change to
params:deploy_forecast_from_file
Sorry to keep harping on this thread. If I could do this one thing before the pipeline executes I think it would solve all of my problems: Access one of the parameters files and change the top level namespace. So lets say I have a pipeline
model
that has two possible parameters files
model_a.yml
and
model_b.yml
, both of which have the namespace
model
If I run
kedro run -t A
, I would like the namespace of
model_b.yml
to change to something unused, such as
*x_*model_*x*
n
So lets say I have a pipeline
model
that has two possible parameters files
model_a.yml
and
model_b.yml
, both of which have the namespace
model
If I run
kedro run -t A
, I would like the namespace of
model_b.yml
to change to something unused, such as
*x_*model_*x*
This sounds a bit strange, why would you change the namespace of a file instead of having a different namespace and select the correct namespace instead?
m
Because selecting the correct namespace involves changing the namespace of each
pipeline
file to the correct namespace as well as all of the pipeline parameters whereas changing only the namespace within a parameters file requires no changes to any of the pipelines
For my own sense of closure I figured out an approach. •
model_a.yml
and other
a
related parameters files get put in a folder
model_a
. •
model_b.yml
and other
b
related parameters files get put in a folder
model_b
. • In
globals.yml
or similar, specify which path between A or B to use. (parameters_source) • In
pipeline_registry.py
filter pipelines based on this file • In
settings.py
change the config patterns based on this file e.g.
Copy code
CONFIG_LOADER_ARGS = {
    "base_env": "base",
    "default_run_env": "local",
    "config_patterns": {
        "parameters": [
            "parameters*",
            "parameters*/**",
            f"{parameters_source}/parameters*",
        ],
    },
}