https://kedro.org/ logo
#questions
Title
# questions
n

Nok Lam Chan

08/08/2023, 10:40 AM
Hello everyone, I have a question regarding the usage of environments in combination with the OmegaConfigLoader.
I have a file called
catalog_globals.yml
in my
base/
config folder, and also in my
prod/
config folder. When I execute
kedro run --env=prod
, the settings from the file in
base/
are still used.
cc @Gerrit Schoettler
https://docs.kedro.org/en/latest/configuration/configuration_basics.html#configuration-environments Kedro environment works as
base
+
local
as default, when you specific
prod
, it means
base
+
prod
. If the same entry exist in both environment,
prod
takes priority but otherwise they will be merged instead.
👍 1
g

Gerrit Schoettler

08/08/2023, 10:46 AM
Thanks a lot @Nok Lam Chan! I still have the question on how to configure my catalog based on the env. It seems like the
OmegaConfigLoader
is not using the
catalog_globals.yml
file in the folder of the env (it's still using the file in the base folder).
n

Nok Lam Chan

08/08/2023, 10:50 AM
Could you share a minimal example to demonstrate your problem? maybe shows the relevant entries in
base/catalog.yml
base/catalog_globals.yml
prod/catalog_globals.yml
prod/catalog.yml
g

Gerrit Schoettler

08/08/2023, 11:28 AM
base/catalog_globals.yml
_bucket_name: <s3://ABC>
prod/catalog_globals.yml
_bucket_name: local_path_1
base/catalog.yml
Copy code
catalog_entry_1:
  type: pandas.ParquetDataSet
  filepath: ${_bucket_name}/file_name_1.parquet

catalog_entry_2:
  type: pandas.ParquetDataSet
  filepath: ${_bucket_name}/file_name_2.parquet
prod/catalog.yml
Copy code
catalog_entry_2:
  type: pandas.ParquetDataSet
  filepath: local_path_2/file_name_2.parquet
When using the prod env, I would like to have the new definition of _bucket_name to be used in the base catalog
My prod catalog does not contain all entries of the base catalog
When I run
kedro run --env=prod,
, it still uses the
_bucket_name
from the
base/catalog_globals.yml
, although I was intending to overwrite it
n

Nok Lam Chan

08/08/2023, 12:05 PM
Indeed - what you observed is an expected behavior. Variable interpolation (templating) is expected to work for the same environment, you can’t interpolate values from other environment. The reason for that is because we think it is error-prone to interpolate cross environment because it is hard to know where is that value coming from.
K 2
https://github.com/kedro-org/kedro/issues/2794 As you have already found. The solution for that will be a ${global} resolver, which is more explicity. I think this is still in development right now.
@Ankita Katiyar Can you share a workaround for this meanwhile?
g

Gerrit Schoettler

08/08/2023, 12:09 PM
Thanks a lot for the clarification and the pointer to the feature in development. @Ankita Katiyar I'd be very happy about a workaround, if you have any hints! 🙂
a

Ankita Katiyar

08/08/2023, 12:52 PM
I would say that creating a separate entry in
prod/catalog.yml
which will overwrite the one in
base/catalog.yml
is the only thing that will work
We always read and resolve all the config in
base
first and then load and resolve the config in
local
(or
prod
in this case). Even with the globals resolver that is still in development, we never propagate any values from run time environments to base environment.
g

Gerrit Schoettler

08/08/2023, 2:18 PM
Thanks a lot @Ankita Katiyar! I didn't know that values cannot be propagated between environments.
I ended up duplicating all catalog entries from base in prod
Still curiously looking forward to the new globals resolver!! Please keep up the development 🙂
🙌 1
@Nok Lam Chan @Ankita Katiyar Should I consider to use namespaces, to separate my catalog of different environments?
a

Ankita Katiyar

08/08/2023, 2:59 PM
I’m not sure if namespacing might be useful in this particular case. We are also working on this feature https://github.com/kedro-org/kedro/issues/2531 After which you might be able to pass
kedro run --params: __my_bucket_ = "local__path_1"
to override “_ my_ bucket” for that particular run
👍 1
g

Gerrit Schoettler

08/10/2023, 10:03 AM
Thanks a lot @Ankita Katiyar!! Looking forward to that feature 🙂
Hi @Ankita Katiyar @Nok Lam Chan, I would like to pick up on this topic. I am now using Kedro 0.18.13 and I am struggling to modify the global parameters using --params. Can you show me again how
kedro run --params: __my_bucket_ = "local__path_1"
should look like?
👀 1
a

Ankita Katiyar

10/16/2023, 8:46 AM
Hi @Gerrit Schoettler, So the globals functionality with
OmegaConfigLoader
is out already. Docs - https://docs.kedro.org/en/stable/configuration/advanced_configuration.html#how-to-use-global-variables-with-the-omegaconfigloader
g

Gerrit Schoettler

10/16/2023, 8:47 AM
yes, saw the update in 0.18.13 🙂 And I understand that the variable name can't start with "_" in globals.
a

Ankita Katiyar

10/16/2023, 8:47 AM
What we ended up doing was enabling people to use “globals” with a
globals:
resolver
g

Gerrit Schoettler

10/16/2023, 8:49 AM
okay, does that prohibit the use of --params? With my new globals.yml file everything works (setting the bucket_name for my catalog). I just can't modify it with --params
a

Ankita Katiyar

10/16/2023, 8:50 AM
Yep, you can’t manipulate globals with --params
g

Gerrit Schoettler

10/16/2023, 8:50 AM
is there any way to manipulate globals from the outside?
I also saw that using environment variables does not work, or should I try that?
a

Ankita Katiyar

10/16/2023, 8:51 AM
With Kedro 0.18.14, you will be able to specify in your catalog
Copy code
catalog_entry_1:
  type: pandas.ParquetDataSet
  filepath: "${runtime_params:bucket_name. default_value}"/file_name_1.parquet
👍 1
n

Nok Lam Chan

10/16/2023, 8:51 AM
@Ankita Katiyar shouldn't it use the runtime_param resolver instead?
a

Ankita Katiyar

10/16/2023, 8:51 AM
But runtime_params wouldn’t work in
globals.yml
you would have to do it directly in the catalog
👍 1
g

Gerrit Schoettler

10/16/2023, 8:52 AM
Any ideas on how I can modify catalog parameters from the outside?
a

Ankita Katiyar

10/16/2023, 8:53 AM
g

Gerrit Schoettler

10/16/2023, 8:53 AM
Any thoughts on when it will be released?
a

Ankita Katiyar

10/16/2023, 8:53 AM
This week, ideally! 🙂
👍 2
It’s already merged to main branch on Github if you wanna try it out
g

Gerrit Schoettler

10/16/2023, 8:54 AM
okay! 🙂 I felt like I have been waiting for 0.18.13 and then I was confused when I still couldn't achieve my task of separating prod/test environments from the outside
Or any other thoughts on how I could achieve that?
a

Ankita Katiyar

10/16/2023, 8:58 AM
You can do the same thing with globals (assuming the premise is the same as what was in the original question) where you can define
base/globals.yml
with
bucket_name:<s3_path>
and a
prod/globals.yml
with
bucket_name:<local_path>
where the
path
variable will be overwritten by the
prod
version And your
catalog.yml
will look like:
Copy code
catalog_entry_1:
  type: pandas.ParquetDataSet
  filepath: "${globals:bucket_name. default_value}"/file_name_1.parquet
g

Gerrit Schoettler

10/16/2023, 8:59 AM
ah interesting, so the globals work across environments?
last time the catalog_globals.yml didn't work across environments
a

Ankita Katiyar

10/16/2023, 9:00 AM
Yep yep, I think I mentioned it wouldn’t before it was implemented but we implemented it with
globals
resolver to work across envs
g

Gerrit Schoettler

10/16/2023, 9:00 AM
premise is still the same as in the original question 🙂
ah okay, cool! Then I think I should try that and just use
--env=prod
to control the environment from the deployment pipeline
a

Ankita Katiyar

10/16/2023, 9:01 AM
catalog_globals.yml
is read as the “catalog” but not as “globals” if that makes sense, so variables in
catalog_globals
are only available to catalog entries and have to begin with underscores so as to not be read by the config_loader as separate catalog entries
1
Yeah, you can use
--env=prod
or you can just define a default value to be used when a global value is not defined
1
g

Gerrit Schoettler

10/16/2023, 9:04 AM
Cool, thanks a lot for the super fast reply @Ankita Katiyar!!
❤️ 2
Hi @Ankita Katiyar @Nok Lam Chan, I now succeeded in using the runtime params. My catalog entries now contain:
${runtime_params:bucket_name, <*default_value*>}
Is there any way to take the default value from the globals.yml, instead of hardcoding it in the catalog?
I just found that this works:
${runtime_params:bucket_name, ${globals:bucket_name}}
👍🏼 1
👍 1
n

Nok Lam Chan

10/23/2023, 2:51 PM
Yes this is working already
noted that the opposite
${globals: bucket_name, ${runime_params: bucket_name}}
won’t work and this is by design.
g

Gerrit Schoettler

10/23/2023, 2:54 PM
Can confim that it's working as I expected, I just didn't know that I need to nest the ${ }
👍🏼 1
n

Nok Lam Chan

10/23/2023, 2:55 PM
Cool! let us know if you find any issues, feedbacks are welcomed!
🙌 2
g

Gerrit Schoettler

10/23/2023, 2:55 PM
Yes, will let you know if anything doesn't work. Thanks for the very quick reply!!
🥳 1
K 2