> Hello everyone, I have a question regarding t...
# questions
n
Hello everyone, I have a question regarding the usage of environments in combination with the OmegaConfigLoader.
I have a file called
catalog_globals.yml
in my
base/
config folder, and also in my
prod/
config folder. When I execute
kedro run --env=prod
, the settings from the file in
base/
are still used.
cc @Gerrit Schoettler
https://docs.kedro.org/en/latest/configuration/configuration_basics.html#configuration-environments Kedro environment works as
base
+
local
as default, when you specific
prod
, it means
base
+
prod
. If the same entry exist in both environment,
prod
takes priority but otherwise they will be merged instead.
👍 1
g
Thanks a lot @Nok Lam Chan! I still have the question on how to configure my catalog based on the env. It seems like the
OmegaConfigLoader
is not using the
catalog_globals.yml
file in the folder of the env (it's still using the file in the base folder).
n
Could you share a minimal example to demonstrate your problem? maybe shows the relevant entries in
base/catalog.yml
base/catalog_globals.yml
prod/catalog_globals.yml
prod/catalog.yml
g
base/catalog_globals.yml
_bucket_name: <s3://ABC>
prod/catalog_globals.yml
_bucket_name: local_path_1
base/catalog.yml
Copy code
catalog_entry_1:
  type: pandas.ParquetDataSet
  filepath: ${_bucket_name}/file_name_1.parquet

catalog_entry_2:
  type: pandas.ParquetDataSet
  filepath: ${_bucket_name}/file_name_2.parquet
prod/catalog.yml
Copy code
catalog_entry_2:
  type: pandas.ParquetDataSet
  filepath: local_path_2/file_name_2.parquet
When using the prod env, I would like to have the new definition of _bucket_name to be used in the base catalog
My prod catalog does not contain all entries of the base catalog
When I run
kedro run --env=prod,
, it still uses the
_bucket_name
from the
base/catalog_globals.yml
, although I was intending to overwrite it
n
Indeed - what you observed is an expected behavior. Variable interpolation (templating) is expected to work for the same environment, you can’t interpolate values from other environment. The reason for that is because we think it is error-prone to interpolate cross environment because it is hard to know where is that value coming from.
K 2
https://github.com/kedro-org/kedro/issues/2794 As you have already found. The solution for that will be a ${global} resolver, which is more explicity. I think this is still in development right now.
@Ankita Katiyar Can you share a workaround for this meanwhile?
g
Thanks a lot for the clarification and the pointer to the feature in development. @Ankita Katiyar I'd be very happy about a workaround, if you have any hints! 🙂
a
I would say that creating a separate entry in
prod/catalog.yml
which will overwrite the one in
base/catalog.yml
is the only thing that will work
We always read and resolve all the config in
base
first and then load and resolve the config in
local
(or
prod
in this case). Even with the globals resolver that is still in development, we never propagate any values from run time environments to base environment.
g
Thanks a lot @Ankita Katiyar! I didn't know that values cannot be propagated between environments.
I ended up duplicating all catalog entries from base in prod
Still curiously looking forward to the new globals resolver!! Please keep up the development 🙂
🙌 1
@Nok Lam Chan @Ankita Katiyar Should I consider to use namespaces, to separate my catalog of different environments?
a
I’m not sure if namespacing might be useful in this particular case. We are also working on this feature https://github.com/kedro-org/kedro/issues/2531 After which you might be able to pass
kedro run --params: __my_bucket_ = "local__path_1"
to override “_ my_ bucket” for that particular run
👍 1
g
Thanks a lot @Ankita Katiyar!! Looking forward to that feature 🙂
Hi @Ankita Katiyar @Nok Lam Chan, I would like to pick up on this topic. I am now using Kedro 0.18.13 and I am struggling to modify the global parameters using --params. Can you show me again how
kedro run --params: __my_bucket_ = "local__path_1"
should look like?
👀 1
a
Hi @Gerrit Schoettler, So the globals functionality with
OmegaConfigLoader
is out already. Docs - https://docs.kedro.org/en/stable/configuration/advanced_configuration.html#how-to-use-global-variables-with-the-omegaconfigloader
g
yes, saw the update in 0.18.13 🙂 And I understand that the variable name can't start with "_" in globals.
a
What we ended up doing was enabling people to use “globals” with a
globals:
resolver
g
okay, does that prohibit the use of --params? With my new globals.yml file everything works (setting the bucket_name for my catalog). I just can't modify it with --params
a
Yep, you can’t manipulate globals with --params
g
is there any way to manipulate globals from the outside?
I also saw that using environment variables does not work, or should I try that?
a
With Kedro 0.18.14, you will be able to specify in your catalog
Copy code
catalog_entry_1:
  type: pandas.ParquetDataSet
  filepath: "${runtime_params:bucket_name. default_value}"/file_name_1.parquet
👍 1
n
@Ankita Katiyar shouldn't it use the runtime_param resolver instead?
a
But runtime_params wouldn’t work in
globals.yml
you would have to do it directly in the catalog
👍 1
g
Any ideas on how I can modify catalog parameters from the outside?
a
g
Any thoughts on when it will be released?
a
This week, ideally! 🙂
👍 2
It’s already merged to main branch on Github if you wanna try it out
g
okay! 🙂 I felt like I have been waiting for 0.18.13 and then I was confused when I still couldn't achieve my task of separating prod/test environments from the outside
Or any other thoughts on how I could achieve that?
a
You can do the same thing with globals (assuming the premise is the same as what was in the original question) where you can define
base/globals.yml
with
bucket_name:<s3_path>
and a
prod/globals.yml
with
bucket_name:<local_path>
where the
path
variable will be overwritten by the
prod
version And your
catalog.yml
will look like:
Copy code
catalog_entry_1:
  type: pandas.ParquetDataSet
  filepath: "${globals:bucket_name. default_value}"/file_name_1.parquet
g
ah interesting, so the globals work across environments?
last time the catalog_globals.yml didn't work across environments
a
Yep yep, I think I mentioned it wouldn’t before it was implemented but we implemented it with
globals
resolver to work across envs
g
premise is still the same as in the original question 🙂
ah okay, cool! Then I think I should try that and just use
--env=prod
to control the environment from the deployment pipeline
a
catalog_globals.yml
is read as the “catalog” but not as “globals” if that makes sense, so variables in
catalog_globals
are only available to catalog entries and have to begin with underscores so as to not be read by the config_loader as separate catalog entries
1
Yeah, you can use
--env=prod
or you can just define a default value to be used when a global value is not defined
1
g
Cool, thanks a lot for the super fast reply @Ankita Katiyar!!
❤️ 2
Hi @Ankita Katiyar @Nok Lam Chan, I now succeeded in using the runtime params. My catalog entries now contain:
${runtime_params:bucket_name, <*default_value*>}
Is there any way to take the default value from the globals.yml, instead of hardcoding it in the catalog?
I just found that this works:
${runtime_params:bucket_name, ${globals:bucket_name}}
👍🏼 1
👍 1
n
Yes this is working already
noted that the opposite
${globals: bucket_name, ${runime_params: bucket_name}}
won’t work and this is by design.
g
Can confim that it's working as I expected, I just didn't know that I need to nest the ${ }
👍🏼 1
n
Cool! let us know if you find any issues, feedbacks are welcomed!
🙌 2
g
Yes, will let you know if anything doesn't work. Thanks for the very quick reply!!
🥳 1
K 2