Hello I have a question w r t to the catalog What is the sim Kedro #questions

Hello! I have a question w.r.t. to the catalog: Wh...

Guillaume Tauzin

02/21/2025, 3:49 PM

Hello! I have a question w.r.t. to the catalog: What is the simplest way to know if a dataset is defined in the "base" env or has been overriden by the selected env catalog? Is this info stored anywhere? Thanks! :)

Hall

02/21/2025, 3:49 PM

Someone will reply to you shortly. In the meantime, this might help:

Deepyaman Datta

02/21/2025, 3:52 PM

Off the top of my head, this isn't stored anywhere. The config loader just merges the hierarchical config using `OmegaConf`; no information kept on source.

Deepyaman Datta

02/21/2025, 3:53 PM

What you could do is attach some

metadata

to all entries in each file, basically stating the source? Then you could access that field. But no built-in way I know of.

Deepyaman Datta

02/21/2025, 3:54 PM

In case @Merel knows something, since drive the work on the new config loader.

Guillaume Tauzin

02/21/2025, 4:03 PM

Thanks @Deepyaman Datta! That's a good idea. Do you think it's possible to have it done automatically at load time? I don't see how it's possible with a hook.

Guillaume Tauzin

02/21/2025, 4:04 PM

One alternative I can think of is to load the catalog and then

.to_config

for both "base" and the selected env and infer where they come from based on their differences.

Merel

02/21/2025, 4:32 PM

Don't have a lot of time to dig right now, but when using destructive merge (the default) and when you have debugging logs on you'll see:

Copy code

"Config from path '%s' will override the following "
                "existing top-level config keys: %s"

in the logging messages

Deepyaman Datta

02/21/2025, 4:39 PM

I don't think it's possible with a hook, unfortunately. However, you can create your own config loader (extending the base

OmegaConfigLoader

), and you could extend it very slightly by defining your own merge strategy/accepting it in the config loader construction. For example, the destructive merge strategy @Merel mentioned: https://github.com/kedro-org/kedro/blob/0.19.11/kedro/config/omegaconf_config.py#L529-L544 Here, you can insert a key into each item in the dict to be merge, using the

env_path

? That should work, would need to probably play around with it a bit.

Elena Khaustova

02/21/2025, 4:46 PM

Indeed, you can’t access

env

through hooks

Guillaume Tauzin

02/21/2025, 5:06 PM

Thank you everyone! I think your solution should work nicely for me @Deepyaman Datta 🙂

🙌 1

Guillaume Tauzin

02/28/2025, 1:26 PM

Hi again! Is there anyway to disable config merging altogether? What worries me in practice is that we do not know when loading a dataset in which env it is defined. For example, let's assume that I have been developing a model and created the pipelines in my "dev" env. The model itself is defined as a dataset in my base env. When I move pipelines to my prod environment, I forgot to move my model dataset to my prod env and it keeps using the base env to find it. A couple month later, I work on improving the model and it the result of my trial and error is moved to prod by mistake and I don't even notice, because I don't know where my datasets are loaded from. In practice, I do think that in this specific case, I'd used mlflow to manage models so this would be very unlikely, But it could still be happening with say intermediate data.

Deepyaman Datta

02/28/2025, 3:12 PM

I don't think theres an out-of-the-box option (unless some configurable merge strategy foes that?), but you could also always define a custom config loader. ~~Probably also one where you might get better answers in #C03RKP2LW64;~~ @Merel and others spent a lot of time on config loaders. (Oops, this was in #C03RKP2LW64, haven't had enough coffee yet. ☕)

👍 1

🎉 1

Merel

02/28/2025, 3:15 PM

@Guillaume Tauzin you could change your settings to only use one environment: https://docs.kedro.org/en/stable/configuration/configuration_basics.html#how-to-use-only-one-configuration-environment

Guillaume Tauzin

02/28/2025, 4:24 PM

@Merel I would still like to have different envs but I would like to for all datasets to be redefined in all of them. I am afraid that, as I can't check in which env a specific dataset is defined, I may have a dev pipeline modifying critical prod assets.

Merel

02/28/2025, 4:29 PM

You still get to decide which environment overwrites which https://docs.kedro.org/en/stable/configuration/configuration_basics.html#how-to-change-the-default-overriding-environment, but that of course doesn't solve the case when a dataset is missing in the overwriting env..

Merel

02/28/2025, 4:29 PM

I'd have to dive into this a bit deeper to come up with a solution.

Guillaume Tauzin

03/03/2025, 7:38 AM

Thank you @Merel 🙂

27 Views

Open in Slack

Previous Next