Hey :slightly_smiling_face: I'm looking for some ...
# questions
i
Hey šŸ™‚ I'm looking for some extra clarity on the order of operations of custom resolvers. A modified version of https://getindata.com/blog/kedro-dynamic-pipelines/
Copy code
model_options:
  target: costs
  primary_key: [a, b]
  columns_to_load: ${append:${.primary_key}, ${.target}}

price_predictor:
  _overrides:
    target: price
  model_options: ${merge:${model_options},${._overrides}}

  base:
    model_options: ${..model_options}
  
  candidate3:
    _overrides:
      primary_key: [x, y, z]
      target: revenue
    model_options: ${merge:${..model_options},${._overrides}}
I was pleasantly surprised to find that for candidate3,
columns_to_load
(
${append:${.primary_key}, ${.target}}
) correctly resolved to
['x', 'y', 'z', 'revenue']
as opposed to
['a', 'b', 'costs']
. Is there a predetermined order of operations when resolving config? This is the exact behavior I want, but I am not sure why it behaves this way, and whether this is dependable behavior. I don't know why the merge was done first and the append at the end, which meant that the resolver had the "true" values of primary key and target available. I would've expected that maybe the first thing to be interpolated would be direct references to other keys, so
model_options.columns_to_load
would already be
[a, b, costs]
when it came time to be merged into price_predictor and candidate_3. EDIT: clarity
Also I realize I am really torturing kedro's principles here šŸ˜µā€šŸ’«
m
Dynamic pipelines FTW šŸ’› I think the ordering is mostly determined by the OmegaConf and how it resolves stuff, less than with Kedro
i
Hmm yeah I think I'm missing some omegaconf knowledge and particularly how kedro uses it, to truly understand this
Interpolations are evaluated lazily on access.
From their documentation. I'm not really sure what "on access" implies for kedro configloading. But from that statement I'm gonna assume I'm safe to depend on this functionality until proven otherwise hahaha it's too useful
this 1
n
@marrrcin is right that most of the stuff is
OmegaConf
. "lazyily on access" is not true for Kedro, as it's eagerly resolved during the configuration resolution. There are some additional layers to make
$global
works nicely with other config as well.
sorry for not answering the question directly, it's quite complicated and I am a bit far from core kedro these days. https://noklam.github.io/blog/posts/kedro_config_loader/2023-11-16-kedro-config-loader-dive-deep.html 90% of the omegaconf stuff happen in the grey area (without the $global resolver). This is where the "lazy" eval suppose to happen, but
kedro
resolves it in eager mode via
Copy code
if key == "parameters":
            # Merge with runtime parameters only for "parameters"
            return OmegaConf.to_container(
                OmegaConf.merge(*aggregate_config, self.runtime_params), resolve=True
            )
The diagram color changes when I copy image - here is the screenshot
i
Yeah i figured kedro would be doing some forced resolving on its end, but I THINK what I mentioned might still apply when resolving within a single environment's config. What I was taking lazy to mean in this case is that while OmegaConf will return
model_options.columns_to_load
as
[a, b, costs]
to kedro, when resolving the node
price_predictor.candidate_3.model_options
where it makes reference to
merge:${..model_options}
, it will actually return the unresolved node
${append:${.primary_key}, ${.target}}
. Honestly, this is purely a guess, but it's the only thing I can come up with which would explain the behavior I observed.
šŸ‘šŸ¼ 1
n
I think you are right, within that grey area it's mostly just OmegaConf. I couldn't confirm this since the only way to find out is probably put a debugger and follow along it. I think this can easily take a person more than 5 minutes to figure out the resolved value, and I am not really a fan of this but it's probably complicated by nature so maybe there is no way around it.
There are some people want to use dataclass as config, I think it quickly become a philosophical discussion whether OOP is good. While config has certain hierarchy nature, they aren't really class and too many merge and override makes it unreadable.
i
Yeah in this example there's 2 levels of inheritance, but in the usecase I'm writing it's only one. There's a base pipeline template and one override per candidate. Considering the lack of editor support for yaml "inheritance" I am definitely not going to be using this feature frequently, for now only in one single application where there is a need for rewriting a lot of pipelines with very minor changes in config.
Further, eventually we will do some config validation using pydantic so we should be able to catch issues relatively easily.
šŸ‘šŸ¼ 1