Starting to mess around with all the goodies I ve been missi Kedro #questions

Starting to mess around with all the goodies I've ...

Iñigo Hidalgo

10/19/2023, 2:33 PM

Starting to mess around with all the goodies I've been missing from kedro 0.18, with a particular focus on interactive jupyter exploration and all the OmegaConf improvements I am trying to get a catalog setup in a flat directory structure (or inside a conf/ folder)

Copy code

my_notebook.ipynb
conf/
  credentials.yml
  catalog.yml

Copy code

#credentials.yml
azure_blob:
  storage_options:
    account_name: ${oc.env:STORAGE_ACCOUNT_ENV_KEY}

Copy code

#catalog.yml
input_data:
  type: polars.GenericDataset
  filepath: az://${oc.env:CONTAINER_NAME_ENV_KEY}
  credentials: azure_blob

Copy code

# notebook
from kedro.config import OmegaConfigLoader
from kedro.io import DataCatalog
conf_loader = OmegaConfigLoader("conf/", base_env="", default_run_env="")

conf_loader["catalog"]

Copy code

UnsupportedInterpolationType: Unsupported interpolation type oc.env
    full_key: filtered_allocation.filepath
    object_type=dict

I seem to be missing this step to enable resolvers https://docs.kedro.org/en/stable/configuration/advanced_configuration.html#how-to-use-resolvers-in-the-omegaconfigloader How would I do this in the interactive "standalone" way? Thanks!

Iñigo Hidalgo

10/19/2023, 2:36 PM

FWIW I've also tried just adding this block

Copy code

CONFIG_LOADER_CLASS = OmegaConfigLoader

CONFIG_LOADER_ARGS = {
    "custom_resolvers": {
        "oc.env": oc.env,
    }
}

before trying to load the config but it didn't make a difference, not surprising since it's meant to go inside settings.py

Iñigo Hidalgo

10/19/2023, 2:40 PM

Got it!

conf_loader = OmegaConfigLoader("conf/", base_env="", default_run_env="", **CONFIG_LOADER_ARGS)

Should've expected that haha

datajoely

10/19/2023, 2:51 PM

what could have we done to make thing clearer on your journey here?

Iñigo Hidalgo

10/19/2023, 2:59 PM

I would say this one's on me, it didn't take me super long to make the inductive leap to actually pass the variable called "config_loader_args" as args to the config loader 😆 I was following this blog by @Amanda (link) which helped me gather all the initial setup arguments, but maybe expanding that blog into a full documentation page where it's more findable could help in the future, since I only knew of its existence from seeing it shared on linkedin. Then the further step into extending with resolvers, custom filepaths for config etc are very well documented in the settings.py paradigm, but expanding that into interactive development could warrant some additional documentation, though a lot of it might be duplicated, and I know there's still quite a lot of discussion about what the canonical recommendation should be for a few of these things.

K 2

Iñigo Hidalgo

10/19/2023, 3:01 PM

The blog is also missing references to credentials, and I had to rely on my previous knowledge that I had to pass

credentials=conf_loader["credentials"]

to the DataCatalog.from_config method

datajoely

10/19/2023, 3:09 PM

Thank you so much for the feedback

Yolan Honoré-Rougé

10/19/2023, 3:18 PM

I played with the blog post yesterday, and I think part of it would likely be in the documentation. The "kedro in Jupiter notebook" section is only about using a notebook in a full kedro project and the %reload_kedro magic, and this is a bit confusing.

Yolan Honoré-Rougé

10/19/2023, 3:20 PM

My bad, it is in the spaceflights example, but a bit hard to discover in the docs

Iñigo Hidalgo

10/19/2023, 3:35 PM

ref for the blog content in docs.kedro.org: https://docs.kedro.org/en/stable/notebooks_and_ipython/notebook-example/add_kedro_to_a_notebook.html My feedback relating to credentials etc still stands, there's a mention of credentials but not rly how to get those into the catalog

👍 2

datajoely

10/19/2023, 3:42 PM

really appreciate the feedback both

Jo Stichbury

10/19/2023, 4:31 PM

@Iñigo Hidalgo have you spotted the new docs that go with the blog post https://docs.kedro.org/en/stable/notebooks_and_ipython/notebook-example/add_kedro_to_a_notebook.html ?

Iñigo Hidalgo

10/19/2023, 4:41 PM

@Jo Stichbury I did but only after Yolan pointed out their existence, and I did feel like I only found them by "randomly" clicking around this page https://docs.kedro.org/en/stable/notebooks_and_ipython/index.html In the same vein you have a paragraph explaining what "Use a Jupyter notebook for Kedro project experiments" contains, a small blurb explaining that Add Kedro features to a notebook gives a brief intro to adding the datacatalog etc step by step, without the burden of a preexesting project might have eased discoverability, as right now it's just a bullet point which I didn't pay much attention to at first glance, only when I was looking for it intentionally.

Iñigo Hidalgo

10/19/2023, 4:42 PM

Right now there's a paragraph explaining "Use a jupyter notebook" then at the end of that paragraph there's the same "Use a jupyter notebook" in a bullet point right beneath the new documentation, which made it seem like something of secondary importance

Jo Stichbury

10/19/2023, 4:58 PM

Ah, thanks. I see what's happened, there's a paragraph missing as a result of a dodgy merge. I'll fix it and it'll come in the next big release.

👍 1

Yolan Honoré-Rougé

10/19/2023, 5:13 PM

I think it may be a good idea to slightly change the formulation to make the difference between "Add Kedro features to a notebook" and "Use a Jupyter notebook for Kedro project experiments" more explicit. It seems obvious now I know what it refers too, but if I were a data analyst with very little knowledge of kedro, I am not sure I'd know which section I should refer too.

👍 1

Jo Stichbury

10/19/2023, 5:14 PM

Now in a PR, thanks for your feedback. And @Yolan Honoré-Rougé that's a good point, although I'm not sure how else to phrase it. What do you think, something like "Add Kedro to a notebook" and "Add a notebook to Kedro" to make it just a bit simpler? Or shall I reword at the risk of more complexity but better explanation?

Jo Stichbury

10/19/2023, 5:15 PM

Feel free to comment on the PR if it's easier: https://github.com/kedro-org/kedro/pull/3210

👍 1

Yolan Honoré-Rougé

10/19/2023, 5:15 PM

Maybe "Add Kedro to an existing notebook"?

Yolan Honoré-Rougé

10/19/2023, 5:18 PM

"Add Kedro to a notebook" and "Add a notebook to Kedro" looks nice, but I am not sure the difference will be crystal clear to users 😅

Yolan Honoré-Rougé

10/19/2023, 5:19 PM

I think the titles should be a more explicit about the difference between the two sections

Yolan Honoré-Rougé

10/19/2023, 5:20 PM

But honestly it's already pretty good, I don't have a much better suggestion

thankyou 1

Yolan Honoré-Rougé

10/19/2023, 5:22 PM

More explicit, but longer "Add Kedro to a notebook without / outside a full Kedro project"?

👍 1

Iñigo Hidalgo

10/23/2023, 5:08 PM

Is there a way to specify a "default" resolver? I ask this because I want to try to replicate the

globals_dict=os.environ

from the old TemplatedConfigLoader but to avoid breaking existing code it would be nice to be able to preserve the existing syntax to interpolate ${environment_variable} without needing to specify

oc.env

at the start

Iñigo Hidalgo

10/23/2023, 5:09 PM

oc.env is definitely a better practice as it makes things explicit, but being able to switch to omegaconfigloader partially without needing to change preexisting config would make it easier

datajoely

10/23/2023, 5:09 PM

@Iñigo Hidalgo you could maybe just dump the env variables into the globals scope using

after_context_created

hook?

Iñigo Hidalgo

10/23/2023, 5:13 PM

What do you mean by globals scope? Setting the environment variables as global python variables?

datajoely

10/23/2023, 5:13 PM

sorry the

globals_dict

Iñigo Hidalgo

10/23/2023, 5:13 PM

I thought the whole concept of globals was basically removed in favor of OC resolvers

datajoely

10/23/2023, 5:13 PM

ah of course facepalming

datajoely

10/23/2023, 5:14 PM

I still think you can mutate the context

datajoely

10/23/2023, 5:15 PM

or actually pass your

os.environ

extra_params

when you create a context either via the CLI or via custom context

datajoely

10/23/2023, 5:15 PM

it’s a bit unpleasant

datajoely

10/23/2023, 5:16 PM

but it can be done

datajoely

10/23/2023, 5:16 PM

https://docs.kedro.org/en/stable/_modules/kedro/framework/context/context.html#KedroContext.__init__

Iñigo Hidalgo

10/23/2023, 5:18 PM

Looking at the OmegaConfigLoader implementation I see

self._globals

and a reference to a globals resolver, do you think that could be a way to rescue old functionality?

datajoely

10/23/2023, 5:18 PM

yeah subclassing OmegaConfigLoader may actually be the easiest solution here

Iñigo Hidalgo

10/23/2023, 5:20 PM

It looks like

_get_globals_value

could already be doing that? Basically I would just have to include the environment variables in self._globals?

Iñigo Hidalgo

10/23/2023, 5:20 PM

Copy code

globals_oc = OmegaConf.create(self._globals)
        interpolated_value = OmegaConf.select(
            globals_oc, variable, default=default_value
        )

Yolan Honoré-Rougé

10/23/2023, 5:21 PM

Or maybe with the

after_context_hook

you can do something like

context.config_loader._globals.update(os.environ)

💰 1

🎉 1

Iñigo Hidalgo

10/23/2023, 5:21 PM

Definitely seems to be the cleanest, thanks @Yolan Honoré-Rougé and @datajoely

Yolan Honoré-Rougé

10/23/2023, 5:22 PM

Note that this won't work in 0.19 anymore though

Iñigo Hidalgo

10/23/2023, 5:22 PM

Immutable context?

Yolan Honoré-Rougé

10/23/2023, 5:23 PM

Yes https://github.com/kedro-org/kedro/issues/3214#issue-1956041633

Iñigo Hidalgo

10/23/2023, 5:24 PM

We're still on 0.17 so our aim will probably be to first get onto the latest stable 0.18, but thanks for the heads up as we are definitely trying to avoid hard-sticking ourselves unnecessarily

👍 1

Iñigo Hidalgo

10/23/2023, 5:24 PM

I will add a comment under that issue though

Yolan Honoré-Rougé

10/23/2023, 5:25 PM

(I was clearly interested answering this question to push the issue forward 🤫 )

😂 2

Nok Lam Chan

10/23/2023, 5:58 PM

Not sure what am I missing, overriding config_loader["globals"] should work?

Nok Lam Chan

10/23/2023, 5:59 PM

The self._globals should be updated automatically IRRC (can't verify with my phone😅

Yolan Honoré-Rougé

10/23/2023, 6:37 PM

Not sure if it works, the

__get__

method of the config loaders does a lot of thing under the hood (on my phone too, I can't check either)

Nok Lam Chan

10/24/2023, 12:35 PM

https://github.com/kedro-org/kedro/blob/5ac14c3f43736c575346063b3f3c0d3494059219/tests/config/test_omegaconf_config.py#L1047-L1066 The test should make sure it works. If you want to keep your original globals + inject new config. Then you can do

Copy code

# after_context_created
conf["globals"] = conf["globals"].update(my_conf) # some dict

👍 2

Iñigo Hidalgo

10/24/2023, 2:35 PM

Thanks for that @Nok Lam Chan. I'm defintely out of my depth when getting into the internals of the Context etc, is this something which should in theory still be viable in 0.19 or would it be impossible due to the immutable context?

Nok Lam Chan

10/24/2023, 3:15 PM

If it works now it should remains working in 0.19. For immutable context, it shouldn’t affect it that way. I am trying to find the ticket but I couldn’t, I just saw @Yolan Honoré-Rougé ticket https://github.com/kedro-org/kedro/issues/3214. This should be discussed before 0.19 release. I would love to hear more opinion first, but I think my opinion now is frozen attribute > frozen class. The main reason is that we don’t have a better mechanism and it’s quite common to create a stateful hook (with context in this case). Unless we come up with a better alternative, I don’t see a benefit of blocking this. In addition, it works fine in 0.18.x without the extra protection, so maybe it’s unnecessary.

4 Views

Open in Slack

Previous Next