Hi everyone, Is there a way to have “truly global ...
# questions
m
Hi everyone, Is there a way to have “truly global params”, i.e params that are immune to “namespace-ing” ? So far, it seems using the namespace feature, involves creating as many duplicated params as there are namespaces… This is quite an unfortunate behavior since (and I guess that I’m not the only one in this situation) a fairly big portion of my config is immutable across namespaces and is uselessly duplicated… Granted, I could use templating or anchors / aliases, but this feels a bit “hacky”. Is there a “cleaner” / more “elegant” way ? Thanks M
m
I have no idea if this is doable in Kedro, but I think the more elegant way would be to create a package with global settings. That package can then be reused in various places
I think in general when people want to reuse anything - settings, logic, etc - across projects, packaging is the way to go
m
Hi @Michel van den Berg Thanks for your answer & suggestion. However, I think that packaging would not be a solution in my specific case. As I understand it, packaging would result in encapsulation / information hiding, which is not what I am looking for, i.e global params that are namespace-agnostic all the while being “exposed” and “editable”. Thanks again though. M.
n
Hey @Marc Gris, if your parameters are generic across different pipelines, it should be fine to leave it without namespace. Did you hit any particular issue?
i.e. leaving your global parameters in
conf/base/parameters.yml
, while leaving the parameters for specific scope in
conf/base/parameters/some_pipeline.yml
m
Hi @Nok Lam Chan Thx for your answer. I did run into issues indeed. But probably because I deviated from the “file structure” you have just outlined 😅 Will try that asap and get back to you. Thx again M.
n
what’s your current structure? In general we advise stick with it unless necessary, but it’s very easy to config it. By default, Kedro search parameters with this pattern.
Copy code
CONFIG_LOADER_ARGS = {
    "config_patterns": {
        "parameters": ["params*", "params*/**", "**/params*"],
    }
}
https://docs.kedro.org/en/latest/configuration/advanced_configuration.html#how-to-change-which-configuration-files-are-loaded
m
@Nok Lam Chan Ok, got to give it a shot. I did not work. in
conf/base/parameters.yml
I have a top-level key call
schema
passing
inputs="params:schema"
to a node does work without applying a namespace to the pipeline. But if a namespace is applied, I get the following error:
Copy code
ValueError: Pipeline input(s) {'params:<namespace>.schema'} not found in the DataCatalog
n
Can you show me the node definition?
m
Sure @Nok Lam Chan … Here it is: It’s actually just a “dummy node”
Copy code
def create_pipeline(**kwargs) -> Pipeline:
    dummy_node = node(
        func=lambda x: print(x),
        inputs="params:schema",
        outputs=None,
    )
    return  pipeline([ dummy_node], namespace="test")
N.B : ConfigLoader wise, we use
OmegaConfigLoader
👀 1
n
Please bear with me for a moment, I am going to have a look later.
m
Of course. Thx 🙏🏼 🙂 N.B : I’ve tried all the different ConfigLoaders and have the same issue…
n
Hey @Marc Gris, thanks for the patience. I think it may helps if we take a step back here to understand what you are trying to do. I think the result here is expected, as you applied the
namespace
so it automatically insert the
test.
prefix in front of your inputs/outputs. https://docs.kedro.org/en/stable/nodes_and_pipelines/modular_pipelines.html
m
Thx @Nok Lam Chan Basically, I’m wondering if there could be a way to have some params that are “global-across-namespace” in order to avoid having to duplicate those with
<namespace-1>.params
,
namespace-2.params
etc… especially in the the case where those params are left identical / unchanged across all namespaces. Do you see what I mean ? 🙂
n
Maybe a stupid question, what about not using namespace at all?
m
Hey hey… tricky…
n
Since you are trying to use namespace but then removing all the namespace, maybe there is a reason for it?
m
Ah ah 😅 I do find the “namespacing” very useful… But for params that do actually change… But, I reckon… I’m being a little bit of a pain in the #%s… Since I am trying to avoid having to add
test.schema: ${schema}
in
conf/parameters/modular_pipeline_1.yml
Thx & sorry for your time 🙏🏼 🙂 Have a nice day Marc
n
Sorry if it appears to be rude, it’s not a rhetorical question. As I struggle to understand what are you trying to achieve. From my understanding, you are trying to escape the namespace for certain parameters, is that correct?
I am still gonna to circle back to this example - https://docs.kedro.org/en/stable/nodes_and_pipelines/modular_pipelines.html#how-to-use-a-modular-pipeline-with-different-parameters. I hope this explains more clearly. When you apply the
namespace
argument, everything in your pipeline is now prefixed. If you need to escape the namespace, you have to pass the input/output/parameters specificly. The reason that you are getting this error because you only have
params:schema
,
Copy code
ValueError: Pipeline input(s) {'params:<namespace>.schema'} not found in the DataCatalog
To use this with different parameters , you can do
Copy code
pipeline(pipe=some_pipeline, parameters = {"params:schema" : "params:my_parameters"}, namesapce="<namespace>")
It may not be immediately obvious, but there is nothing stopping you to pass the same
params:schema
parameters so it can escape the namespace (Or the global you are referring to)
Copy code
pipeline(pipe=some_pipeline, parameters = {"params:schema" : "params:schema"}, namesapce="<namespace>")
m
Hey @Nok Lam Chan Thx a lot. “no appearance of rudeness perceived at all”. I highly appreciate the time you’re taking to help me out 🙂 Thanks for your last message: It makes me realize that I’m not completely clear about the
parameters
parameter for modular / namespaced pipelines. I will ponder onto this and try things out on my side before getting back to you. Thanks (a lot) again 🙏🏼 M.
One last thing… I’ve seen the following syntax for so-called “dataset factory”
Copy code
"{namespace}.data":
    type: ….
    filepath: …
Is there anything equivalent for parameters ? 🙂 (I’ve tried… and of course.. failed 😅 ) Thx
n
Dataset factory is created for
catalog.yml
specifically, what are your use cases? Maybe you can just give me pseudocode even if this feature doesn’t exist.
m
My use case is still a “bit muddled” by the confusion you can witness in the chat above 😉 I just “have a sense” that if could allow to minimize redundant boilerplate in parameter files, when some parameters are constant / invariant across namespaces while others are. This syntax would allow to have a single
"{namespace}.invariant_param" : …
combined with multiple
namespace_*1*.changing_param:...
,
namespace_*2*.changing_param: ...
As soon as things get clearer, I’ll get back to you if needed 👍🏼 🙂
👍🏼 1
n
Thanks! or even feel free to open more Github Issue as you already did. When I just go through the namespace pipeline docs I found that there are many things that we can improve, it’s probably part of the reason that causes many confusion.
👍🏼 1
m
Thanks ! I’ll actually go ahead with opening a github issue 👍🏼 I do think that consistency of syntax across
catalog.yml
&
parameters.yml
is a desirable thing 🙂
👍🏼 1
Hi @Nok Lam Chan I’m still pondering / “chewing” on the subject. Going through the docs I see this (cf. screenshot) Is that a typo ?
👀 1
n
In a full day meeting today, will come back to it tomorrow.
👍🏼 1
@Marc Gris Sorry for late response. I am making a change to remove this.
params
was not namespace pre- 0.18, but it was changed, the doc should reflect this. https://github.com/kedro-org/kedro/pull/2839
👍🏼 1
m
Thx @Nok Lam Chan My turn to apologize: I was on holiday for a week. Hence my late answer. Thx again M
K 1
j
I went ahead and opened an issue 🙂 https://github.com/kedro-org/kedro/issues/3086