Starting to mess around with all the goodies I've ...
# questions
i
Starting to mess around with all the goodies I've been missing from kedro 0.18, with a particular focus on interactive jupyter exploration and all the OmegaConf improvements I am trying to get a catalog setup in a flat directory structure (or inside a conf/ folder)
Copy code
my_notebook.ipynb
conf/
  credentials.yml
  catalog.yml
Copy code
#credentials.yml
azure_blob:
  storage_options:
    account_name: ${oc.env:STORAGE_ACCOUNT_ENV_KEY}
Copy code
#catalog.yml
input_data:
  type: polars.GenericDataset
  filepath: az://${oc.env:CONTAINER_NAME_ENV_KEY}
  credentials: azure_blob
Copy code
# notebook
from kedro.config import OmegaConfigLoader
from kedro.io import DataCatalog
conf_loader = OmegaConfigLoader("conf/", base_env="", default_run_env="")

conf_loader["catalog"]
Copy code
UnsupportedInterpolationType: Unsupported interpolation type oc.env
    full_key: filtered_allocation.filepath
    object_type=dict
I seem to be missing this step to enable resolvers https://docs.kedro.org/en/stable/configuration/advanced_configuration.html#how-to-use-resolvers-in-the-omegaconfigloader How would I do this in the interactive "standalone" way? Thanks!
FWIW I've also tried just adding this block
Copy code
CONFIG_LOADER_CLASS = OmegaConfigLoader

CONFIG_LOADER_ARGS = {
    "custom_resolvers": {
        "oc.env": oc.env,
    }
}
before trying to load the config but it didn't make a difference, not surprising since it's meant to go inside settings.py
Got it!
conf_loader = OmegaConfigLoader("conf/", base_env="", default_run_env="", **CONFIG_LOADER_ARGS)
Should've expected that haha
d
what could have we done to make thing clearer on your journey here?
i
I would say this one's on me, it didn't take me super long to make the inductive leap to actually pass the variable called "config_loader_args" as args to the config loader 😆 I was following this blog by @Amanda (link) which helped me gather all the initial setup arguments, but maybe expanding that blog into a full documentation page where it's more findable could help in the future, since I only knew of its existence from seeing it shared on linkedin. Then the further step into extending with resolvers, custom filepaths for config etc are very well documented in the settings.py paradigm, but expanding that into interactive development could warrant some additional documentation, though a lot of it might be duplicated, and I know there's still quite a lot of discussion about what the canonical recommendation should be for a few of these things.
K 2
The blog is also missing references to credentials, and I had to rely on my previous knowledge that I had to pass
credentials=conf_loader["credentials"]
to the DataCatalog.from_config method
d
Thank you so much for the feedback
y
I played with the blog post yesterday, and I think part of it would likely be in the documentation. The "kedro in Jupiter notebook" section is only about using a notebook in a full kedro project and the %reload_kedro magic, and this is a bit confusing.
My bad, it is in the spaceflights example, but a bit hard to discover in the docs
i
ref for the blog content in docs.kedro.org: https://docs.kedro.org/en/stable/notebooks_and_ipython/notebook-example/add_kedro_to_a_notebook.html My feedback relating to credentials etc still stands, there's a mention of credentials but not rly how to get those into the catalog
👍 2
d
really appreciate the feedback both
j
@Iñigo Hidalgo have you spotted the new docs that go with the blog post https://docs.kedro.org/en/stable/notebooks_and_ipython/notebook-example/add_kedro_to_a_notebook.html ?
i
@Jo Stichbury I did but only after Yolan pointed out their existence, and I did feel like I only found them by "randomly" clicking around this page https://docs.kedro.org/en/stable/notebooks_and_ipython/index.html In the same vein you have a paragraph explaining what "Use a Jupyter notebook for Kedro project experiments" contains, a small blurb explaining that Add Kedro features to a notebook gives a brief intro to adding the datacatalog etc step by step, without the burden of a preexesting project might have eased discoverability, as right now it's just a bullet point which I didn't pay much attention to at first glance, only when I was looking for it intentionally.
Right now there's a paragraph explaining "Use a jupyter notebook" then at the end of that paragraph there's the same "Use a jupyter notebook" in a bullet point right beneath the new documentation, which made it seem like something of secondary importance
j
Ah, thanks. I see what's happened, there's a paragraph missing as a result of a dodgy merge. I'll fix it and it'll come in the next big release.
👍 1
y
I think it may be a good idea to slightly change the formulation to make the difference between "Add Kedro features to a notebook" and "Use a Jupyter notebook for Kedro project experiments" more explicit. It seems obvious now I know what it refers too, but if I were a data analyst with very little knowledge of kedro, I am not sure I'd know which section I should refer too.
👍 1
j
Now in a PR, thanks for your feedback. And @Yolan Honoré-Rougé that's a good point, although I'm not sure how else to phrase it. What do you think, something like "Add Kedro to a notebook" and "Add a notebook to Kedro" to make it just a bit simpler? Or shall I reword at the risk of more complexity but better explanation?
Feel free to comment on the PR if it's easier: https://github.com/kedro-org/kedro/pull/3210
👍 1
y
Maybe "Add Kedro to an existing notebook"?
"Add Kedro to a notebook" and "Add a notebook to Kedro" looks nice, but I am not sure the difference will be crystal clear to users 😅
I think the titles should be a more explicit about the difference between the two sections
But honestly it's already pretty good, I don't have a much better suggestion
thankyou 1
More explicit, but longer "Add Kedro to a notebook without / outside a full Kedro project"?
👍 1
i
Is there a way to specify a "default" resolver? I ask this because I want to try to replicate the
globals_dict=os.environ
from the old TemplatedConfigLoader but to avoid breaking existing code it would be nice to be able to preserve the existing syntax to interpolate ${environment_variable} without needing to specify
oc.env
at the start
oc.env is definitely a better practice as it makes things explicit, but being able to switch to omegaconfigloader partially without needing to change preexisting config would make it easier
d
@Iñigo Hidalgo you could maybe just dump the env variables into the globals scope using
after_context_created
hook?
i
What do you mean by globals scope? Setting the environment variables as global python variables?
d
sorry the
globals_dict
i
I thought the whole concept of globals was basically removed in favor of OC resolvers
d
ah of course facepalming
I still think you can mutate the context
or actually pass your
os.environ
to
extra_params
when you create a context either via the CLI or via custom context
it’s a bit unpleasant
but it can be done
i
Looking at the OmegaConfigLoader implementation I see
self._globals
and a reference to a globals resolver, do you think that could be a way to rescue old functionality?
d
yeah subclassing OmegaConfigLoader may actually be the easiest solution here
i
It looks like
_get_globals_value
could already be doing that? Basically I would just have to include the environment variables in self._globals?
Copy code
globals_oc = OmegaConf.create(self._globals)
        interpolated_value = OmegaConf.select(
            globals_oc, variable, default=default_value
        )
y
Or maybe with the
after_context_hook
you can do something like
context.config_loader._globals.update(os.environ)
?
💰 1
🎉 1
i
Definitely seems to be the cleanest, thanks @Yolan Honoré-Rougé and @datajoely
y
Note that this won't work in 0.19 anymore though
i
Immutable context?
y
i
We're still on 0.17 so our aim will probably be to first get onto the latest stable 0.18, but thanks for the heads up as we are definitely trying to avoid hard-sticking ourselves unnecessarily
👍 1
I will add a comment under that issue though
y
(I was clearly interested answering this question to push the issue forward 🤫 )
😂 2
n
Not sure what am I missing, overriding config_loader["globals"] should work?
The self._globals should be updated automatically IRRC (can't verify with my phone😅
y
Not sure if it works, the
__get__
method of the config loaders does a lot of thing under the hood (on my phone too, I can't check either)
n
https://github.com/kedro-org/kedro/blob/5ac14c3f43736c575346063b3f3c0d3494059219/tests/config/test_omegaconf_config.py#L1047-L1066 The test should make sure it works. If you want to keep your original globals + inject new config. Then you can do
Copy code
# after_context_created
conf["globals"] = conf["globals"].update(my_conf) # some dict
👍 2
i
Thanks for that @Nok Lam Chan. I'm defintely out of my depth when getting into the internals of the Context etc, is this something which should in theory still be viable in 0.19 or would it be impossible due to the immutable context?
n
If it works now it should remains working in 0.19. For immutable context, it shouldn’t affect it that way. I am trying to find the ticket but I couldn’t, I just saw @Yolan Honoré-Rougé ticket https://github.com/kedro-org/kedro/issues/3214. This should be discussed before 0.19 release. I would love to hear more opinion first, but I think my opinion now is frozen attribute > frozen class. The main reason is that we don’t have a better mechanism and it’s quite common to create a stateful hook (with context in this case). Unless we come up with a better alternative, I don’t see a benefit of blocking this. In addition, it works fine in 0.18.x without the extra protection, so maybe it’s unnecessary.