Hi Team :kedro: Issue: I my setup, the catalog ov...
# questions
a
Hi Team K Issue: I my setup, the catalog overriding is not happening when I run the kedro pipeline using a different env kedro version:
0.19.6
config loader:
OmegaConfigLoader
In
conf/base/catalog
globals.yml
has the following:
Copy code
_base_path: <gs://dev-bucket-shared>
datasets.yml
has the following:
Copy code
my_dataset:
  type: pandas.CSVDataset
  filepath: "${_base_path}/03_primary/my_dataset.csv"
In
conf/local/catalog
globals.yml
has the following:
Copy code
_base_path: <gs://dev-bucket-users/my_name>
• There is nothing else I put under
conf/local
• The local conf should override the base conf when
kedro run --env local --pipeline my_pipeline
is run but is does not. It is still using what is present in
conf/base/catalog
Seems like something really basic I am doing wrong, any help is appreciated! 🙂
n
Hi @Abhishek Bhatia I guess you migrate from old code base? The way to use global is via a resolver instead of just templating(the old way or in OmegaConf terms interpolation) https://docs.kedro.org/en/stable/configuration/config_loader_migration.html I suggest checking out the migration guide and read the documentation of OmegaConfigLoader, they look similar sometimes but with a lot of differences as well.
If you search globals resolver you should be able to find examples in the docs
a
Thanks @Nok Lam Chan However I am slightly confused. I was able to make it work now though. Is the following correct? 1. If I have to override
catalog/globals.yml
across environments, I can't do it unless I move everything to
conf/base/globals.yml
2. If I do keep my catalog specific globals in
catalog/globals.yml
then I have to make the entries start with
_
3. In "normal"
globals.yml
, I can't prefix them by
_
4. I reference globals as
{globals:some_key}
K 1
n
@Abhishek Bhatia Hey! I can see there are some confusion of the terminology. I actually dislike the name of
globals.yml
in
catalog/globals.yml
. That was created before the
${globals}
was implemented so there are some overlapping terminology.
Let me explains this better. 1. There is one thing "global", which is the
globals.yml
, this is the only thing that are global in config loader and you can call with
${globals:}
in any type of configuration, i.e.
parameters
,
catalog
etc
Copy code
self.config_patterns = {
            "catalog": ["catalog*", "catalog*/**", "**/catalog*"],
            "parameters": ["parameters*", "parameters*/**", "**/parameters*"],
            "credentials": ["credentials*", "credentials*/**", "**/credentials*"],
            "globals": ["globals.yml"],
2. There is another thing that is called
catalog_globals.yml
or
parameters_globals.yml
, they are nothing but templating, you cannot refer them as
${globals}
in your configuraiton. In fact, if you copy all content of
catalog_globals.yml
->
catalog.yml
, it is identical. This is more of a convention to separate out the template value to a separate file, but not mandatory.
a
Ah! I see, I understand now. For me, these templating variables in
catalog/globals.yml
(technically not globals, now I understand), which must start with
_
were not getting overriden in
local
vs
base
, so I switched to pure globals
n
3.
_
prefix can always be used (correct me if I am wrong), the reason that you want to prefix with
_
is: • Convention, it's clear that that value is mean to be used as template value rather than actual configuration • for
catalog.yml
, it's mandatory (not for parameters). This has to do with Kedro's DataCatalog validation. By using
_
it bypass the process, and we know that the value is only a template value, i.e. _`_base_bucket= my_s3_bucket`_ , rather than an invalid dataset entry.
a
So, exactly this.
_base_bucket
is different in
base
env vs
local
env (user specific bucket), but was not getting overriden
So maybe the resolution order of catalog entries is different? In base, all catalog entries get resolved, then they get overriden by whatever is in local. Since I don't explicitly override catalog entries rather just the
_base_path
so the catalog entries in
base
remain same as
local
(which is not the intended outcome)
👍🏼 1
It's a bit confusing, but I think this might be happening
_
can not be used in globals I think
👀 1
n
https://github.com/kedro-org/kedro/issues/4018, I opened up an issue for documentation here, it would be great if you can add your thought there.
a
Great thanks, will add! 🙂
n
So maybe the resolution order of catalog entries is different?
In base, all catalog entries get resolved, then they get overriden by whatever is in local. Since I don't explicitly override catalog entries rather just the
_base_path
so the catalog entries in
base
remain same as
local
(which is not the intended outcome)
This is true, and it has always been the case for Kedro configuration environment. Config are resolved within its own environment (otherwise there's no point of having a separate environment). It will get override with
local
environment, but without resolution (a dictionary merge basically)
_
can not be used in globals I think
Let me double check on this
a
So it's better to use
base_path
in
globals.yml
then as opposed to using it as a templating variable
_base_path
Then catalog entries get correctly overriden
I think not just better, but rather needed (atleast I wasn't able to set it up with templating variables)
n
What was not working? Would be great if you can show a minimal example how you were doing it
@Ankita Katiyar Do you remember why
_
is banned for
$globals
? I looked up the PR but couldn't find the explanation, I think we should at least mention it in the docs.
a
hmm, I think it’s because OmegaConfigLoader filters out all the variables that start with
_
after resolution : https://github.com/kedro-org/kedro/blob/e2b20a49159d62680d0131d1338f06d84b340a44/kedro/config/omegaconf_config.py#L343-L350
n
But this is the key of the config right? In this case it's not allowing
${globals: _some_config}
a
Because when globals is loaded at first, the
_some_config
key will be lost. Globals is loaded first so that it is resolved across environments and then when parameters are loaded it fills in keys from
config_loader[globals]
and
_some_config
doesn’t exist at that point