Florian d
05/30/2023, 1:48 PMbefore_pipeline_run
hook? In some cases it would be good to have access to the loaded config at that point.FlorianGD
05/30/2023, 1:56 PMafter_catalog_created
. As for the reason, I do not know, but I think there were some discussions in issues on GithubFlorian d
05/30/2023, 2:04 PMafter_context_created
which has access to the context. However, in this use-case the knowledge about which pipeline/nodes is run (to identify the root-datasets), the catalog as well as the context (to grab some infos from the overall config that was used) would be needed.
Therefore in after_context_created
I’m missing the pipeline, maybe that can be generated from the config 🤷
In before_pipeline_run
I’m missing the overall config but not sure there is a reason for this? Maybe to not modify it?Nok Lam Chan
05/30/2023, 2:06 PMFlorian d
05/30/2023, 2:15 PMkedro-azureml
project. There we want to support local execution of kedro pipelines that use AzureML datasets as root-input datasets. Due to various reasons atm we can’t use azureml-fsspec
natively and thus the idea we have at the moment is to use a hook that downloads the datasets that serve as the root input datasets when the project is run locally.
In the azureml.yml
config that is created upon using kedro-azureml
we have information that is used to connect to azureml and use the sdk to get filepaths and versions of registered data assets.
I am not sure this is clear as there are many aspects to this. Happy to answer any more questions.
There are some more details in this PR: https://github.com/tomasvanpottelbergh/kedro-azureml/pull/1
And some more related things in these issue/PR:
https://github.com/kedro-org/kedro-plugins/issues/200
https://github.com/getindata/kedro-azureml/pull/60Nok Lam Chan
05/30/2023, 2:31 PMclass SomeHook:
@hook_spec
def after_context_created(self, context):
...
self.my_azure_config = xxx
@hook_spec
def before_pipeline_create(self, pipeline):
do_something_about_my_pipeline(pipeline, self.my_azure_config)
pipeline
Florian d
05/30/2023, 2:47 PMNok Lam Chan
05/30/2023, 3:03 PMIñigo Hidalgo
05/30/2023, 4:41 PMNok Lam Chan
05/30/2023, 4:53 PMYolan Honoré-Rougé
05/30/2023, 6:44 PMafter_context_created
hook was introduced we had this discussion (and much more) with @Antony Milne here: https://github.com/kedro-org/kedro/pull/1465#issuecomment-1118357158 and we ended up to the same conclusion. It is likely worth documenting since the question arise from time to time. Most of the original discussion is tracked here: https://github.com/kedro-org/kedro/issues/506, with some in depth information on how and why it was designed this way. Introducing the after_context_created
hook instead of adding the context in before_pipeline_run
was done to :
• ensure consistency between interactive, programmatic and CLI workflows to run kedro
• avoid breaking the signature of the hook to introduce the change fasterNok Lam Chan
05/30/2023, 6:45 PMAntony Milne
05/30/2023, 7:28 PMIñigo Hidalgo
05/31/2023, 9:25 AMNok Lam Chan
05/31/2023, 10:50 AMIñigo Hidalgo
05/31/2023, 1:29 PMNok Lam Chan
05/31/2023, 1:47 PMdevelop
, however, there were some discussions that it may be too strict and we may end up only freezing the public attribute.
https://github.com/kedro-org/kedro/pull/1465
It was not meant to be changed by the user, as this is one of the core objects that control the run, updating it during the run makes it very hard to control.
The situation that you mentioned is something that we should avoid as much as possible. In theory, it can cause problems but it seems working fine for most people.
TL;DR I agree some advance docs explaining what happen under the hood + this advance example would be usefulJuan Luis
06/15/2023, 5:44 AM