Hey team, I am using `.` dot separated namespacing...
# questions
p
Hey team, I am using
.
dot separated namespacing in my parameters files. I suppose
OmegaConfigLoader
creates the flat dictionary instead of creating the nested namespace based dictionary. Shall this conversion of flat to nested dictionary be a part of the
OmegaConfigLoader
? Example:
Copy code
# parameters.yaml
namespace1.sub1.key1: value1
namespace1.sub2.key2: value2

# resolved config by kedro OmegaConfigLoader:
{"namespace1.sub1.key1": "value1", "namespace1.sub2.key2": "value2"}

# expected by OmegaConf (for `select` method to work correctly):
{"namespace1": {"sub1": {"key1": "value1"}, "sub2": {"key2": "value2"}}}
πŸ‘€ 1
s
Hi, thank you for your question. You're right that the
OmegaConfigLoader
in Kedro currently flattens the dictionary when it loads parameters, and I think this is intentional as
OmegaConfigLoader
does not automatically convert the flat dictionary into a nested one. I think this approach works mostly well for Kedro. But feel free to open an issue suggesting we implement this on kedro side.
p
The only reason why I would like to have this on kedro side is because I have pipeline and sub-pipeline folder structure in my conf.
Copy code
conf
β”œβ”€β”€ base
β”‚   β”œβ”€β”€ pipeline1
β”‚   β”‚   β”œβ”€β”€ subpipeline1_1
β”‚   β”‚   β”‚   β”œβ”€β”€ catalog.yaml
β”‚   β”‚   β”‚   └── parameters.yaml
β”‚   β”‚   └── subpipeline1_2
β”‚   β”‚       β”œβ”€β”€ catalog.yaml
β”‚   β”‚       └── parameters.yaml
β”‚   └── pipeline2
β”‚       β”œβ”€β”€ subpipeline2_1
β”‚       β”‚   β”œβ”€β”€ catalog.yaml
β”‚       β”‚   └── parameters.yaml
β”‚       └── subpipeline2_2
β”‚           β”œβ”€β”€ catalog.yaml
β”‚           └── parameters.yaml
└── local
    └── credentials.yaml
Now if I have parameters like:
Copy code
# pipeline1/subpipeline1_1/parameters.yaml
pipeline1.subpipeline1_1.myparam: myval

# pipeline1/subpipeline1_2/parameters.yaml
pipeline1.subpipeline1_2.myparam: myval
Now, if I want to retrieve pipeline1 params I will have to do it in a single params file. I can even do the following in the two files:
Copy code
# pipeline1/subpipeline1_1/parameters.yaml
pipeline1:
  subpipeline1_1:
    myparam: myval

# pipeline1/subpipeline1_2/parameters.yaml
pipeline1:
  subpipeline1_2:
    myparam: myval
Because then I guess there was some error in kedro which I need to recheck (probably that the param -
params:pipeline1
already exists).
πŸ‘ 1
s
I see, it makes sense given your use case. I think its worth raising this as a potential enhancement, it would allow for more flexible handling of nested parameters.
p
https://github.com/kedro-org/kedro/issues/4077 If you want I can raise a PR for the same, only if the team agrees that this is the right thing to do?
n
p
Support the creation of nested dict from flat dict?
s
I don't think it does, its something we'll have to implement.
p
Yes. It doesn’t. We will have to implement ourselves.
n
I think it would be interesting to see what it takes to add this support. I can see the value of having a top level nested parameters to prevent sub pipeline being deeply nested (less so in the intermediate level as it start to create more confusion. Would you be open to experiment with it?
p
What do you mean when you say top level nested params. Can you please explain with example? If you ask me:
Copy code
nested_config = {}
for key, value in config.items():
    parts = key.split('.')
    d = nested_config
    for part in parts[:-1]:
        if part not in d:
            d[part] = {}
        d = d[part]
    d[parts[-1]] = value
Above code should do the job
I guess we can classify this as a bug even since we use OmegaConf under the hood as default ConfigLoader and since it expects a nested tree to work correctly?
n
My view on this is, I think it's okay to have nested parameter, but only for the top level keys. What I mean is, this is okay A.b.c: D: 1 This is bad, or at least I don't want this enable by default. (sorry for the casing as I am typing on phone) A: B.c.d: 1 These will result in the same tree but encourage bad pattern and making config more difficult to reason. I don't think this is a bug, as you mention this is not something OmegaConf support by default. I am also unsure in the case of ambiguity, which one would OmegaConf resolve or it simply cause error.
πŸ‘ 1
p
I think OmegaConf expects a nested dict instead of a flat dict with keys having namespaces. I suppose in that case the user has to always specify their params as follows:
Copy code
# conf/base/namespace1/sub-namespace1.yaml
namespace1:
  sub-namespace1:
    key1: value1

# conf/base/namespace1/sub-namespace2.yaml
namespace1:
  sub-namespace2:
    key2: value2
    key3: value3
The problem with this is that kedro doesn't allow the user to create such structure in two different files because the root key is same in both files (namespace1). In that case since user can't specify what OmegaConf requires and kedro doesn't have functionality to convert
.
separated flat namespaced keys to nested dicts which is what OmegaConf expects - in that sense it classifies as a bug? Also, see the following:
Copy code
oc
Out[25]: {'a.b': 1, 'a.c': 2}

OmegaConf.select(oc, "a.b")

OmegaConf.select(oc, ".a.b")

OmegaConf.select(oc, ".")
Out[28]: {'a.b': 1, 'a.c': 2}
n
I think OmegaConf expects a nested dict instead of a flat dict with keys having namespaces.
Copy code
In [1]: from omegaconf import OmegaConf
O
In [2]: conf = {"a.b.c": 1}

In [3]: omega_conf = OmegaConf.create(conf)

In [4]: omega_conf
Out[4]: {'a.b.c': 1}
This is what I mean, and I disagree that Omegaconf expect a flat dict
p
Can you please try selecting
"a.b.c"
? Also, just to be clear I said it expects a nested dict and not a flat dict for any namespaced keys.
n
Copy code
# conf/base/namespace1/sub-namespace1.yaml
namespace1:
  sub-namespace1:
    key1: value1

# conf/base/namespace1/sub-namespace2.yaml
namespace1:
  sub-namespace2:
    key2: value2
    key3: value3
This is a valid issue, could you open an issue for this? I try to comment out the duplicate validation and this is the result. This doesn't require any change of config, so it would be an easier change and more of a bug in the validation itself.
Copy code
'namespace1': {
        'sub-namespace2': {'key1': 'value1'},
        'sub-namespace1': {'key1': 'value1'}
    }
}
p
If you can manage to fix the duplicate issue that would solve it IMO. Let me raise another issue
Or maybe I can jot it in the existing issue
Please let me know in case I can help with a PR
n
It would be best to have two issue IMO. 1. #4077 is a request of feature which change how Kedro parse the configuration, this has more unpredicted effect on how Kedro works, particular how people can use the YAML config and potentially introduce inconsistency between Kedro and OmegaConf. 2. The 2nd issue is a smaller one, which is just fixing the duplication logic. That is not a duplicate config technically so Kedro shouldn't block it. (And you can certainly create the config with
OmegaConf.create
I'd take PR for the 2nd issue, the first need more discussion. @Ankita Katiyar any thought on this?
p
Awesome! Thanks!
πŸ‘πŸΌ 1
n
amazing! thanks for putting thought in it, feel free to raise a PR to fix the validation error
p
I am trying to create a new issue for the same as you requested. Will raise a PR just after that
πŸ‘πŸΌ 1
n
I opened an issue on OmegaConf https://github.com/omry/omegaconf/issues/1189 side to get some clarification. But their development seems pretty slow these days so I am not sure if anyone would respond to that.
p
Haha same but let's wait for their answer
😁 1
Hey @Merel! FYI ^ PR link: https://github.com/kedro-org/kedro/pull/4092 Would be great to get your reviews as well @Nok Lam Chan and @Sajid Alam
A reminder to please review the PR. 😊
m
Hi @Puneet Saini, I'm catching up on this now. Can you explain how you've namespaced your pipelines? I'm not entirely sure I understand how you end up with the same top-level key for your parameters if you're using namespaces.
p
Hi @Merel! Thank you for looking into this. As I explained in this message I am splitting my params based on pipeline and subpipeline level. Say for example:
Copy code
# conf/base/pipeline_1/subpipeline_1_1/parameters.yaml
pipeline_1.subpipeline_1_1.param1: value1
pipeline_1.subpipeline_1_1.param2:
  - value2
  - value3

# conf/base/pipeline_1/subpipeline_1_2/parameters.yaml
pipeline_1.subpipeline_1_2.param1: value4
Now if I want to leverage all the parameters for pipeline_1 that would be a combo of both these params. Or if not even that something more granular which should be possible, example
pipeline_1.subpipeline_1_1
I might need to refactor my parameters as follows:
Copy code
# conf/base/pipeline_1/subpipeline_1_1/parameters.yaml
pipeline_1:
  subpipeline_1_1:
    param1: value1
    param2:
      - value2
      - value3

# conf/base/pipeline_1/subpipeline_1_2/parameters.yaml
pipeline_1:
  subpipeline_1_2:
    param1: value4
But this won't work because kedro sees
pipeline_1
as a duplicate top level key. Ideally I believe we should not have
.
separated keys in parameters because OmegaConf doesn't identify that in its select method but you have to have a nested key value structure for namespacing. (You can also find more on this OmegaConf issue) So my PR addresses this issue where if we have same top level keys but different keys inside then it shouldn't fail the validation check in OmegaConfigLoader in kedro. I hope this explains well. Please feel free to ask any follow ups. Happy to help
m
I'm starting to get it a bit more. The part I'm struggling to understand is why
pipeline_1
ends up being the top-level key. Can you describe how this setup looks for a real world application? It's a bit hard to reason with these fake names of pipeline_1 and subpipeline_1 etc. This is how we describe pipeline reuse ourselves: https://docs.kedro.org/en/stable/nodes_and_pipelines/namespaces.html, so you wouldn't end up with the same top-level key if you use namespaces like that. I'm wondering if there's something that can be changed about your project design instead of adding this new feature.
p
Open to hearing your thoughts. So my project looks a bit like following (Can't share the whole thing - sorry) : 1. I have multi country pipeline. (So to cater to that you can image it as top level namespace) 2. In my pipeline to be used across multiple countries, I have divided it into subpipelines, consider intermediate, primary, feature, predictive-modeling (This contains two different sub pipelines which again dwells into a third level of namespacing) TLDR; In total I have 3 different namespaces; Countries, Pipelines, Sub-pipelines. There can be more namespaces going forward since I am dealing with Pharma data (Includes reusing components across different diseases). Would be great to get your views on the project structure PS: Thank you for sharing the documentation on namespacing. I didn't know it existed but I suppose it would just solve for 2 levels of namespacing?
m
Thanks for sharing the context Puneet, this makes sense now. I'll review the PR asap! The docs for namespacing just show one simple example, but Kedro allows an arbitrary levels deep of namespace pipeline.