Hey slightly smiling face I m looking for some extra clarity Kedro #questions

Hey :slightly_smiling_face: I'm looking for some ...

Iñigo Hidalgo

04/26/2024, 12:23 PM

Hey 🙂 I'm looking for some extra clarity on the order of operations of custom resolvers. A modified version of https://getindata.com/blog/kedro-dynamic-pipelines/

Copy code

model_options:
  target: costs
  primary_key: [a, b]
  columns_to_load: ${append:${.primary_key}, ${.target}}

price_predictor:
  _overrides:
    target: price
  model_options: ${merge:${model_options},${._overrides}}

  base:
    model_options: ${..model_options}
  
  candidate3:
    _overrides:
      primary_key: [x, y, z]
      target: revenue
    model_options: ${merge:${..model_options},${._overrides}}

I was pleasantly surprised to find that for candidate3,

columns_to_load

(

${append:${.primary_key}, ${.target}}

) correctly resolved to

['x', 'y', 'z', 'revenue']

as opposed to

['a', 'b', 'costs']

. Is there a predetermined order of operations when resolving config? This is the exact behavior I want, but I am not sure why it behaves this way, and whether this is dependable behavior. I don't know why the merge was done first and the append at the end, which meant that the resolver had the "true" values of primary key and target available. I would've expected that maybe the first thing to be interpolated would be direct references to other keys, so

model_options.columns_to_load

would already be

[a, b, costs]

when it came time to be merged into price_predictor and candidate_3. EDIT: clarity

Iñigo Hidalgo

04/26/2024, 12:26 PM

Also I realize I am really torturing kedro's principles here 😵‍💫

marrrcin

04/26/2024, 2:31 PM

Dynamic pipelines FTW 💛 I think the ordering is mostly determined by the OmegaConf and how it resolves stuff, less than with Kedro

Iñigo Hidalgo

04/26/2024, 2:47 PM

Hmm yeah I think I'm missing some omegaconf knowledge and particularly how kedro uses it, to truly understand this

Interpolations are evaluated lazily on access.

From their documentation. I'm not really sure what "on access" implies for kedro configloading. But from that statement I'm gonna assume I'm safe to depend on this functionality until proven otherwise hahaha it's too useful

this 1

Nok Lam Chan

04/26/2024, 4:47 PM

@marrrcin is right that most of the stuff is

OmegaConf

. "lazyily on access" is not true for Kedro, as it's eagerly resolved during the configuration resolution. There are some additional layers to make

$global

works nicely with other config as well.

Nok Lam Chan

04/26/2024, 4:52 PM

sorry for not answering the question directly, it's quite complicated and I am a bit far from core kedro these days. https://noklam.github.io/blog/posts/kedro_config_loader/2023-11-16-kedro-config-loader-dive-deep.html 90% of the omegaconf stuff happen in the grey area (without the $global resolver). This is where the "lazy" eval suppose to happen, but

kedro

resolves it in eager mode via

Copy code

if key == "parameters":
            # Merge with runtime parameters only for "parameters"
            return OmegaConf.to_container(
                OmegaConf.merge(*aggregate_config, self.runtime_params), resolve=True
            )

Nok Lam Chan

04/26/2024, 4:53 PM

The diagram color changes when I copy image - here is the screenshot

Iñigo Hidalgo

04/26/2024, 5:07 PM

Yeah i figured kedro would be doing some forced resolving on its end, but I THINK what I mentioned might still apply when resolving within a single environment's config. What I was taking lazy to mean in this case is that while OmegaConf will return

model_options.columns_to_load

[a, b, costs]

to kedro, when resolving the node

price_predictor.candidate_3.model_options

where it makes reference to

merge:${..model_options}

, it will actually return the unresolved node

${append:${.primary_key}, ${.target}}

. Honestly, this is purely a guess, but it's the only thing I can come up with which would explain the behavior I observed.

👍🏼 1

Nok Lam Chan

04/26/2024, 6:13 PM

I think you are right, within that grey area it's mostly just OmegaConf. I couldn't confirm this since the only way to find out is probably put a debugger and follow along it. I think this can easily take a person more than 5 minutes to figure out the resolved value, and I am not really a fan of this but it's probably complicated by nature so maybe there is no way around it.

Nok Lam Chan

04/26/2024, 6:14 PM

There are some people want to use dataclass as config, I think it quickly become a philosophical discussion whether OOP is good. While config has certain hierarchy nature, they aren't really class and too many merge and override makes it unreadable.

Iñigo Hidalgo

04/29/2024, 8:56 AM

Yeah in this example there's 2 levels of inheritance, but in the usecase I'm writing it's only one. There's a base pipeline template and one override per candidate. Considering the lack of editor support for yaml "inheritance" I am definitely not going to be using this feature frequently, for now only in one single application where there is a need for rewriting a lot of pipelines with very minor changes in config.

Iñigo Hidalgo

04/29/2024, 8:57 AM

Further, eventually we will do some config validation using pydantic so we should be able to catch issues relatively easily.

👍🏼 1

4 Views

Open in Slack

Previous Next