Does anyone know if there is a reason why we could...
# questions
f
Does anyone know if there is a reason why we could not pass the context to the
before_pipeline_run
hook? In some cases it would be good to have access to the loaded config at that point.
1
f
I think the correct place to use the context in hooks is considered to be
after_catalog_created
. As for the reason, I do not know, but I think there were some discussions in issues on Github
f
For sure, there is also
after_context_created
which has access to the context. However, in this use-case the knowledge about which pipeline/nodes is run (to identify the root-datasets), the catalog as well as the context (to grab some infos from the overall config that was used) would be needed. Therefore in
after_context_created
I’m missing the pipeline, maybe that can be generated from the config 🤷 In
before_pipeline_run
I’m missing the overall config but not sure there is a reason for this? Maybe to not modify it?
n
Can you describe what you are trying to do here and what do you need?
f
Working on the
kedro-azureml
project. There we want to support local execution of kedro pipelines that use AzureML datasets as root-input datasets. Due to various reasons atm we can’t use
azureml-fsspec
natively and thus the idea we have at the moment is to use a hook that downloads the datasets that serve as the root input datasets when the project is run locally. In the
azureml.yml
config that is created upon using
kedro-azureml
we have information that is used to connect to azureml and use the sdk to get filepaths and versions of registered data assets. I am not sure this is clear as there are many aspects to this. Happy to answer any more questions. There are some more details in this PR: https://github.com/tomasvanpottelbergh/kedro-azureml/pull/1 And some more related things in these issue/PR: https://github.com/kedro-org/kedro-plugins/issues/200 https://github.com/getindata/kedro-azureml/pull/60
n
We try to limit our API surface, and that’s the primary reason why we define the spec and only expose certain arguments in specific hooks, as that’s where we expected people to customise them. (And yes we don’t expect people to modify config in arbitary place, as you can imagine this could quickly become unmaintainable) Supporting the execution of the local pipeline sounds like a reasonable idea to me. A hook is a stateful object, so you can pass the state between them. One pattern you can use if the argument isn’t available in the spec.
Copy code
class SomeHook:

@hook_spec
def after_context_created(self, context):
   ...
   self.my_azure_config = xxx

@hook_spec
def before_pipeline_create(self, pipeline):
   do_something_about_my_pipeline(pipeline, self.my_azure_config)
❤️ 2
K 3
wizard hat 2
This should allows you access the necessary config where you also need the
pipeline
K 1
f
ah of course, thanks I think this should help
🎉 1
n
Awesome 🙂
i
Wow @Nok Lam Chan I never realized that was a possible pattern, I guess I never made it as far as reasoning out that the hooks are passed as instantiated objects. That will be so useful for some things we're trying to get off 0.17.1
🔥 1
n
This is usually the way I recommend when it’s needed, I guess we never document it.
kedroid 1
y
@Nok Lam Chan When the
after_context_created
hook was introduced we had this discussion (and much more) with @Antony Milne here: https://github.com/kedro-org/kedro/pull/1465#issuecomment-1118357158 and we ended up to the same conclusion. It is likely worth documenting since the question arise from time to time. Most of the original discussion is tracked here: https://github.com/kedro-org/kedro/issues/506, with some in depth information on how and why it was designed this way. Introducing the
after_context_created
hook instead of adding the context in
before_pipeline_run
was done to : • ensure consistency between interactive, programmatic and CLI workflows to run kedro • avoid breaking the signature of the hook to introduce the change faster
👍 1
👀 1
n
Thank you @Yolan Honoré-Rougé 😀 I was looking for the issued but I couldn’t find it.
a
Yeah, the fact that hooks are stateful is extremely useful. It’s kind of an “unofficial” workaround (=I think Ivan would not like it 😅) but it’s necessary a lot of the time I think and IMO we should document it. We limit the arguments that are available in a hook specification to ones we think will be useful at that point in the execution timeline, because otherwise everything would be available in every hook and it (a) gets cluttered and overwhelming and (b) probably doesn’t encourage best practice for hook authors. But I would treat the arguments that are available in hook specs as a recommendation of us saying “these are what’s probably useful to you” rather than us saying “you’re not allowed to use anything else in this hook”.
K 3
i
I've been playing around with it and it's definitely super useful. But if it becomes documented behavior and "supported", it would be important to specify in which order different Hooks classes are executed (from what @Yolan Honoré-Rougé mentioned here it's "plugin install order then in order of definition")
K 1
n
In what way it would affect you? Plugins aren’t suppose to interact with each other. It does sometime causes confusing behaviour but I will love to learn more about that. https://github.com/kedro-org/kedro/issues/2493
IIRC, you can force the order by importing the plugin hook in settings, as it’s using plushy and it’s LIFO. It can get more complicated because the hook spec has argument that one can move certain hook to be executed last, if there are multiple hook declaring the same then it becomes unclear.
(Again this may be undocumented behaviour which I experiment briefly before) 😅
i
Haven't encountered a usecase where it's relevant yet, but just thinking of building different hooks which both affect mutable objects (is the context mutable?) could lead to unexpected behaviors, so it would just be good to have the execution order documented alongside the explanation of this usage.
n
We had a discussion about this last year and we make it immutable in
develop
, however, there were some discussions that it may be too strict and we may end up only freezing the public attribute. https://github.com/kedro-org/kedro/pull/1465 It was not meant to be changed by the user, as this is one of the core objects that control the run, updating it during the run makes it very hard to control. The situation that you mentioned is something that we should avoid as much as possible. In theory, it can cause problems but it seems working fine for most people. TL;DR I agree some advance docs explaining what happen under the hood + this advance example would be useful
j
Opened https://github.com/kedro-org/kedro/issues/2690 to better document this!
👍 1
👍🏼 1