Does anyone know if there is a reason why we could not pass Kedro #questions

Does anyone know if there is a reason why we could...

Florian d

05/30/2023, 1:48 PM

Does anyone know if there is a reason why we could not pass the context to the

before_pipeline_run

hook? In some cases it would be good to have access to the loaded config at that point.

✅ 1

FlorianGD

05/30/2023, 1:56 PM

I think the correct place to use the context in hooks is considered to be

after_catalog_created

. As for the reason, I do not know, but I think there were some discussions in issues on Github

Florian d

05/30/2023, 2:04 PM

For sure, there is also

after_context_created

which has access to the context. However, in this use-case the knowledge about which pipeline/nodes is run (to identify the root-datasets), the catalog as well as the context (to grab some infos from the overall config that was used) would be needed. Therefore in

after_context_created

I’m missing the pipeline, maybe that can be generated from the config 🤷 In

before_pipeline_run

I’m missing the overall config but not sure there is a reason for this? Maybe to not modify it?

Nok Lam Chan

05/30/2023, 2:06 PM

Can you describe what you are trying to do here and what do you need?

Florian d

05/30/2023, 2:15 PM

Working on the

kedro-azureml

project. There we want to support local execution of kedro pipelines that use AzureML datasets as root-input datasets. Due to various reasons atm we can’t use

azureml-fsspec

natively and thus the idea we have at the moment is to use a hook that downloads the datasets that serve as the root input datasets when the project is run locally. In the

azureml.yml

config that is created upon using

kedro-azureml

we have information that is used to connect to azureml and use the sdk to get filepaths and versions of registered data assets. I am not sure this is clear as there are many aspects to this. Happy to answer any more questions. There are some more details in this PR: https://github.com/tomasvanpottelbergh/kedro-azureml/pull/1 And some more related things in these issue/PR: https://github.com/kedro-org/kedro-plugins/issues/200 https://github.com/getindata/kedro-azureml/pull/60

Nok Lam Chan

05/30/2023, 2:31 PM

We try to limit our API surface, and that’s the primary reason why we define the spec and only expose certain arguments in specific hooks, as that’s where we expected people to customise them. (And yes we don’t expect people to modify config in arbitary place, as you can imagine this could quickly become unmaintainable) Supporting the execution of the local pipeline sounds like a reasonable idea to me. A hook is a stateful object, so you can pass the state between them. One pattern you can use if the argument isn’t available in the spec.

Copy code

class SomeHook:

@hook_spec
def after_context_created(self, context):
   ...
   self.my_azure_config = xxx

@hook_spec
def before_pipeline_create(self, pipeline):
   do_something_about_my_pipeline(pipeline, self.my_azure_config)

❤️ 2

K 3

wizard hat 2

Nok Lam Chan

05/30/2023, 2:32 PM

This should allows you access the necessary config where you also need the

pipeline

K 1

Florian d

05/30/2023, 2:47 PM

ah of course, thanks I think this should help

🎉 1

Nok Lam Chan

05/30/2023, 3:03 PM

Awesome 🙂

Iñigo Hidalgo

05/30/2023, 4:41 PM

Wow @Nok Lam Chan I never realized that was a possible pattern, I guess I never made it as far as reasoning out that the hooks are passed as instantiated objects. That will be so useful for some things we're trying to get off 0.17.1

🔥 1

Nok Lam Chan

05/30/2023, 4:53 PM

This is usually the way I recommend when it’s needed, I guess we never document it.

kedroid 1

Yolan Honoré-Rougé

05/30/2023, 6:44 PM

@Nok Lam Chan When the

after_context_created

hook was introduced we had this discussion (and much more) with @Antony Milne here: https://github.com/kedro-org/kedro/pull/1465#issuecomment-1118357158 and we ended up to the same conclusion. It is likely worth documenting since the question arise from time to time. Most of the original discussion is tracked here: https://github.com/kedro-org/kedro/issues/506, with some in depth information on how and why it was designed this way. Introducing the

after_context_created

hook instead of adding the context in

before_pipeline_run

was done to : • ensure consistency between interactive, programmatic and CLI workflows to run kedro • avoid breaking the signature of the hook to introduce the change faster

👍 1

👀 1

Nok Lam Chan

05/30/2023, 6:45 PM

Thank you @Yolan Honoré-Rougé 😀 I was looking for the issued but I couldn’t find it.

Antony Milne

05/30/2023, 7:28 PM

Yeah, the fact that hooks are stateful is extremely useful. It’s kind of an “unofficial” workaround (=I think Ivan would not like it 😅) but it’s necessary a lot of the time I think and IMO we should document it. We limit the arguments that are available in a hook specification to ones we think will be useful at that point in the execution timeline, because otherwise everything would be available in every hook and it (a) gets cluttered and overwhelming and (b) probably doesn’t encourage best practice for hook authors. But I would treat the arguments that are available in hook specs as a recommendation of us saying “these are what’s probably useful to you” rather than us saying “you’re not allowed to use anything else in this hook”.

K 3

Iñigo Hidalgo

05/31/2023, 9:25 AM

I've been playing around with it and it's definitely super useful. But if it becomes documented behavior and "supported", it would be important to specify in which order different Hooks classes are executed (from what @Yolan Honoré-Rougé mentioned here it's "plugin install order then in order of definition")

K 1

Nok Lam Chan

05/31/2023, 10:50 AM

In what way it would affect you? Plugins aren’t suppose to interact with each other. It does sometime causes confusing behaviour but I will love to learn more about that. https://github.com/kedro-org/kedro/issues/2493

Nok Lam Chan

05/31/2023, 10:52 AM

IIRC, you can force the order by importing the plugin hook in settings, as it’s using plushy and it’s LIFO. It can get more complicated because the hook spec has argument that one can move certain hook to be executed last, if there are multiple hook declaring the same then it becomes unclear.

Nok Lam Chan

05/31/2023, 10:53 AM

(Again this may be undocumented behaviour which I experiment briefly before) 😅

Iñigo Hidalgo

05/31/2023, 1:29 PM

Haven't encountered a usecase where it's relevant yet, but just thinking of building different hooks which both affect mutable objects (is the context mutable?) could lead to unexpected behaviors, so it would just be good to have the execution order documented alongside the explanation of this usage.

Nok Lam Chan

05/31/2023, 1:47 PM

We had a discussion about this last year and we make it immutable in

develop

, however, there were some discussions that it may be too strict and we may end up only freezing the public attribute. https://github.com/kedro-org/kedro/pull/1465 It was not meant to be changed by the user, as this is one of the core objects that control the run, updating it during the run makes it very hard to control. The situation that you mentioned is something that we should avoid as much as possible. In theory, it can cause problems but it seems working fine for most people. TL;DR I agree some advance docs explaining what happen under the hood + this advance example would be useful

Juan Luis

06/15/2023, 5:44 AM

Opened https://github.com/kedro-org/kedro/issues/2690 to better document this!

👍 1

👍🏼 1

17 Views

Open in Slack

Previous Next