Puneet Saini
08/09/2024, 8:51 AM.
dot separated namespacing in my parameters files. I suppose OmegaConfigLoader
creates the flat dictionary instead of creating the nested namespace based dictionary. Shall this conversion of flat to nested dictionary be a part of the OmegaConfigLoader
?
Example:
# parameters.yaml
namespace1.sub1.key1: value1
namespace1.sub2.key2: value2
# resolved config by kedro OmegaConfigLoader:
{"namespace1.sub1.key1": "value1", "namespace1.sub2.key2": "value2"}
# expected by OmegaConf (for `select` method to work correctly):
{"namespace1": {"sub1": {"key1": "value1"}, "sub2": {"key2": "value2"}}}
Sajid Alam
08/09/2024, 9:34 AMOmegaConfigLoader
in Kedro currently flattens the dictionary when it loads parameters, and I think this is intentional as OmegaConfigLoader
does not automatically convert the flat dictionary into a nested one. I think this approach works mostly well for Kedro.
But feel free to open an issue suggesting we implement this on kedro side.Puneet Saini
08/09/2024, 9:48 AMconf
βββ base
β βββ pipeline1
β β βββ subpipeline1_1
β β β βββ catalog.yaml
β β β βββ parameters.yaml
β β βββ subpipeline1_2
β β βββ catalog.yaml
β β βββ parameters.yaml
β βββ pipeline2
β βββ subpipeline2_1
β β βββ catalog.yaml
β β βββ parameters.yaml
β βββ subpipeline2_2
β βββ catalog.yaml
β βββ parameters.yaml
βββ local
βββ credentials.yaml
Now if I have parameters like:
# pipeline1/subpipeline1_1/parameters.yaml
pipeline1.subpipeline1_1.myparam: myval
# pipeline1/subpipeline1_2/parameters.yaml
pipeline1.subpipeline1_2.myparam: myval
Now, if I want to retrieve pipeline1 params I will have to do it in a single params file. I can even do the following in the two files:
# pipeline1/subpipeline1_1/parameters.yaml
pipeline1:
subpipeline1_1:
myparam: myval
# pipeline1/subpipeline1_2/parameters.yaml
pipeline1:
subpipeline1_2:
myparam: myval
Because then I guess there was some error in kedro which I need to recheck (probably that the param - params:pipeline1
already exists).Sajid Alam
08/09/2024, 10:06 AMPuneet Saini
08/09/2024, 10:23 AMNok Lam Chan
08/09/2024, 12:18 PMPuneet Saini
08/09/2024, 12:46 PMSajid Alam
08/09/2024, 12:47 PMPuneet Saini
08/09/2024, 12:48 PMNok Lam Chan
08/09/2024, 9:41 PMPuneet Saini
08/10/2024, 5:11 AMnested_config = {}
for key, value in config.items():
parts = key.split('.')
d = nested_config
for part in parts[:-1]:
if part not in d:
d[part] = {}
d = d[part]
d[parts[-1]] = value
Above code should do the jobPuneet Saini
08/13/2024, 5:17 PMNok Lam Chan
08/13/2024, 6:16 PMPuneet Saini
08/14/2024, 6:42 AM# conf/base/namespace1/sub-namespace1.yaml
namespace1:
sub-namespace1:
key1: value1
# conf/base/namespace1/sub-namespace2.yaml
namespace1:
sub-namespace2:
key2: value2
key3: value3
The problem with this is that kedro doesn't allow the user to create such structure in two different files because the root key is same in both files (namespace1). In that case since user can't specify what OmegaConf requires and kedro doesn't have functionality to convert .
separated flat namespaced keys to nested dicts which is what OmegaConf expects - in that sense it classifies as a bug?
Also, see the following:
oc
Out[25]: {'a.b': 1, 'a.c': 2}
OmegaConf.select(oc, "a.b")
OmegaConf.select(oc, ".a.b")
OmegaConf.select(oc, ".")
Out[28]: {'a.b': 1, 'a.c': 2}
Nok Lam Chan
08/14/2024, 10:56 AMI think OmegaConf expects a nested dict instead of a flat dict with keys having namespaces.
In [1]: from omegaconf import OmegaConf
O
In [2]: conf = {"a.b.c": 1}
In [3]: omega_conf = OmegaConf.create(conf)
In [4]: omega_conf
Out[4]: {'a.b.c': 1}
This is what I mean, and I disagree that Omegaconf expect a flat dictPuneet Saini
08/14/2024, 10:57 AM"a.b.c"
? Also, just to be clear I said it expects a nested dict and not a flat dict for any namespaced keys.Nok Lam Chan
08/14/2024, 10:57 AM# conf/base/namespace1/sub-namespace1.yaml
namespace1:
sub-namespace1:
key1: value1
# conf/base/namespace1/sub-namespace2.yaml
namespace1:
sub-namespace2:
key2: value2
key3: value3
This is a valid issue, could you open an issue for this? I try to comment out the duplicate validation and this is the result. This doesn't require any change of config, so it would be an easier change and more of a bug in the validation itself.
'namespace1': {
'sub-namespace2': {'key1': 'value1'},
'sub-namespace1': {'key1': 'value1'}
}
}
Puneet Saini
08/14/2024, 11:00 AMPuneet Saini
08/14/2024, 11:00 AMPuneet Saini
08/14/2024, 11:01 AMPuneet Saini
08/14/2024, 11:04 AMNok Lam Chan
08/14/2024, 11:04 AMOmegaConf.create
Nok Lam Chan
08/14/2024, 11:05 AMPuneet Saini
08/14/2024, 11:06 AMNok Lam Chan
08/14/2024, 11:15 AMPuneet Saini
08/14/2024, 11:16 AMNok Lam Chan
08/14/2024, 11:25 AMPuneet Saini
08/14/2024, 11:26 AMPuneet Saini
08/15/2024, 10:51 AMPuneet Saini
08/19/2024, 7:59 AMMerel
08/19/2024, 3:41 PMPuneet Saini
08/19/2024, 4:06 PM# conf/base/pipeline_1/subpipeline_1_1/parameters.yaml
pipeline_1.subpipeline_1_1.param1: value1
pipeline_1.subpipeline_1_1.param2:
- value2
- value3
# conf/base/pipeline_1/subpipeline_1_2/parameters.yaml
pipeline_1.subpipeline_1_2.param1: value4
Now if I want to leverage all the parameters for pipeline_1 that would be a combo of both these params. Or if not even that something more granular which should be possible, example pipeline_1.subpipeline_1_1
I might need to refactor my parameters as follows:
# conf/base/pipeline_1/subpipeline_1_1/parameters.yaml
pipeline_1:
subpipeline_1_1:
param1: value1
param2:
- value2
- value3
# conf/base/pipeline_1/subpipeline_1_2/parameters.yaml
pipeline_1:
subpipeline_1_2:
param1: value4
But this won't work because kedro sees pipeline_1
as a duplicate top level key. Ideally I believe we should not have .
separated keys in parameters because OmegaConf doesn't identify that in its select method but you have to have a nested key value structure for namespacing. (You can also find more on this OmegaConf issue)
So my PR addresses this issue where if we have same top level keys but different keys inside then it shouldn't fail the validation check in OmegaConfigLoader in kedro.
I hope this explains well. Please feel free to ask any follow ups. Happy to helpMerel
08/19/2024, 4:29 PMpipeline_1
ends up being the top-level key. Can you describe how this setup looks for a real world application? It's a bit hard to reason with these fake names of pipeline_1 and subpipeline_1 etc. This is how we describe pipeline reuse ourselves: https://docs.kedro.org/en/stable/nodes_and_pipelines/namespaces.html, so you wouldn't end up with the same top-level key if you use namespaces like that.
I'm wondering if there's something that can be changed about your project design instead of adding this new feature.Puneet Saini
08/19/2024, 5:09 PMMerel
08/20/2024, 9:30 AM