https://kedro.org/ logo
#questions
Title
# questions
l

Lodewic van Twillert

09/03/2023, 6:36 PM
Hi all, I am happy to learn that
0.18.13
now lets us use globals and templating variables with `OmegaConfigLoader`party wizard But... I have trouble getting it to work with configuration for multiple environments. Would you expect this simplified example to work?? I have a catalog.yml using templating variables, and I want to change the template variables for different environments. Hopefully then I don't need to re-define entries.
Copy code
# base/catalog.yml
primary_input_1:
  type: pandas.CSVDataSet
  filepath: ${_storage.prefix}/data/primary/input_1.csv
  credentials: ${_storage.credentials}
Base template for
_storage
Copy code
# base/catalog_templating.yml
_storage:
    prefix: "base"
    credentials: null
Local template for
_storage
Copy code
# local/catalog_templating.yml
_storage:
    prefix: "data"   # <-- different here
    credentials: null
Template for
_aws
as example
Copy code
# local/catalog_templating.yml
_storage:
    prefix: "<s3://my_data_bucket/data>"
    credentials: dev_s3
When testing, the
_storage
variable does not get overwritten when changing environments. I expect that the dataset in the
base
environment, has a different filepath than in the
local
environment. But... nothing changes in the
local
env 😞
base
Copy code
%reload_kedro ../ --env=base
catalog.datasets.primary_input_1._describe()
>>>
{
    'filepath': PurePosixPath('./test-omega-templating/base/data/primary/input_1.csv'),
    'protocol': 'file',
    'load_args': {},
    'save_args': {'index': False},
    'version': None
}
local
Copy code
%reload_kedro ../ --env=local
catalog.datasets.primary_input_1._describe()
>>>
{
    'filepath': PurePosixPath('./test-omega-templating/base/data/primary/input_1.csv'),
    'protocol': 'file',
    'load_args': {},
    'save_args': {'index': False},
    'version': None
}
^ I expected
'filepath': ./test-omega-templating/data/data/primary/input_1.csv
a

Ankita Katiyar

09/03/2023, 6:42 PM
Try using globals with the “globals” resolver for templating values across environments - https://docs.kedro.org/en/stable/configuration/advanced_configuration.html#how-to-use-global-variables-with-the-omegaconfigloader
The global variables should be in
globals.yml
Regular variable interpolation does not work across environments by design, the catalog entries with variable interpolation are resolved per environment. But for globals, all the globals are loaded first( the keys in local will overwrite the base) and then resolved in the catalog/parameters etc
thankyou 1
👍🏼 1
l

Lodewic van Twillert

09/03/2023, 7:07 PM
Oh ok, yeah I considered that first but then opted against
globals
because I wouldn't actually re-use these values for both params and catalog. The docs say
The benefit of using globals over regular variable interpolation is that the global variables are shared across different configuration types, such as catalog and parameters.
and I didn't want that benefit - but it does suggest that 'regular variable interpolation' would do the same thing. And it would make my template variables a little shorter in syntax 🤓 Makes sense to use globals though, given how they are loaded. I will try that next, thanks for the explanation 👍
To follow-up, your suggestion worked and I also managed to reduce the syntax the way I wanted. Here's what I end up with, hope to hear your thoughts: •
globals.yml
- The only file that changes between each environment •
catalog.yml
- I don't wan't to re-define this in any environment, only 1 in
base
•
catalog_templating.yml
to capture the globals and turn them into composite parameters used throughout the catalog. And reduce syntax length in the
catalog.yml
base
Copy code
#globals.yml
storage:
    prefix: base
    credentials: null

folders:
    raw: 01_raw
    intermediate: 02_intermediate
    primary: 03_primary
    feature: 04_feature
    model_input: 05_model_input
    models: 06_models
    model_output: 07_model_output
    reporting: 08_reporting
Copy code
#catalog.yml 
primary_input_1:
  type: pandas.CSVDataSet
  filepath: ${_folders.primary}/input_1.csv
  credentials: ${_credentials}
Copy code
#catalog_templating.yml
_folders:
    raw: ${globals:storage.prefix}/${globals:folders.raw}
    intermediate: ${globals:storage.prefix}/${globals:folders.intermediate}
    primary: ${globals:storage.prefix}/${globals:folders.primary}
    feature: ${globals:storage.prefix}/${globals:folders.feature}
    model_input: ${globals:storage.prefix}/${globals:folders.model_input}
    models: ${globals:storage.prefix}/${globals:folders.models}
    model_output: ${globals:storage.prefix}/${globals:folders.model_output}
    reporting: ${globals:storage.prefix}/${globals:folders.reporting}

_credentials: ${globals:storage.credentials}
local
Copy code
#globals.yml
storage:
    prefix: data
    credentials: null
aws
Copy code
#globals.yml
storage:
    prefix: "<s3://my_data_bucket/data>"
    credentials: dev_s3
❤️ 3
👍🏼 1
a

Ankita Katiyar

09/04/2023, 3:13 PM
Perhaps we should add the part about cross environment interpolation also being a benefit of using the globals feature to the docs. 😄
👍 1