Hi team, quick question: is it possible to use the...
# questions
a
Hi team, quick question: is it possible to use the catalog from one environment (let's say 'prod') when running a pipeline in another environment (let's say 'prod2')? CC/ @Jose Luis Lavado Sánchez
l
Hi Alex, what exactly is your use-case?
I've submitted this issue a few days ago. This is a
--from-env
flag that allows reading from one env, and writing to another. Not sure if this is what you're looking for https://github.com/kedro-org/kedro/issues/4155
If you wish to extend your catalog entries, you could look into codifying shared entries in
base
and overriding entries in another env, e.g.,
cloud
. Kedro will load both base and cloud, and give priority to configuration in cloud
j
On execution-time we need we use the context to access a keyvault and set the needed creds, on this context we use the env variable to know which creds are the correct ones. So there are two envs that share catalog (there are others on the project) but need different credentials for example same query structure but different database.
Maybe the
--from-env
is the solution, not sure
l
I think the
base
and a custom environment should do the trick no?
codifying all your entries in base, and for your custom structure override it in another env
j
Not sure, because there are other environments on the project that have same dataset on the catalog but with other queries, config, .... But maybe just by preference order should work?
l
aha you have more envs
alternatively you could implement the possibility to select multiple environments with a priority order
it's something I've been thinking about for a while as well
j
Yes. Which is the default preference order in kedro. For example if I run env
dev
it will look for
/conf/dev
and if it do not found the dataset names look for them on
/conf/base
? If that the case I can make it work with that
l
though the
--from-env
flag would cover your use-case already as well, you will only be reading stuff from this env. What the flag does is, it loads the
--from-env
catalog, and it attempts to override all input datasets of the selected pipeline (or selection or nodes) to use the catalog entries from the
--from-env
. (it currently errors out of the input dataset does not exist in the
from-env
but you could choose to skip the error, and default to the catalog entry from the
env
no, its essentially a dictionary, so the yaml dics are merged very much in the same fashion python merges dicts
but it merges base env, local env and selected env
j
So, if I set for example
--from-env prod --env prod_A
on the context I will get that the environment is
prod_A
but I will get the catalog from
/conf/prod/catalog.yml
?
l
it will only use the entries from prod for the INPUTS of your pipeline
all others will use prod_A
j
Okay perfect that's all I need thanks!
l
check it out! can drop notes on the issue as well
m
@Jose Luis Lavado Sánchez To get back to your comment earlier: "For example if I run env
dev
it will look for
/conf/dev
and if it do not found the dataset names look for them on
/conf/base
?" this is exactly how Kedro works as described here: https://docs.kedro.org/en/stable/configuration/configuration_basics.html#configuration-environments You can use settings to specify what your default overriding environment and base environments should be if you want them to be different from "base" and "local": https://docs.kedro.org/en/stable/configuration/configuration_basics.html#how-to-change-the-default-overriding-environment
j
Thank you, that solve my problem
👍 1