https://kedro.org/ logo
#questions
Title
# questions
h

Hugo Evers

01/11/2024, 12:53 PM
Is it possible to modify which hooks are active based on the env? so my hook is only active when
kedro run --env=aws_batch
but on other envs? I am looking at implementing a mechanism for this in the settings.py, but i dont know how to modify the active hooks in the config loader class, the registered hooks dont seem to be accessible there.
d

datajoely

01/11/2024, 12:54 PM
so I actually think this happens too late in the lifecycle, but what you could do is use the
KEDRO_ENV
environment variable way of setting envs and pick that up in the hook?
h

Hugo Evers

01/11/2024, 12:55 PM
yeah, but then the hook is responsible for picking up when to be active, which would be counterintuitive
basically, i want to log errors in pipelines when running on aws batch, send the logs and traceback to chatgpt, and then send chatgpt’s analysis to a slack channel
but, i dont want that to happen everytime i run a pipeline locally, only when i specify. and there could be different scenarios for when i want to disable the hook. id rather just change the HOOKS variable in the settings.py then accomodate for every scenario in the hook
also because i use aws batch for deployment, i dont use a .env file or env variables, everything is set in the kedro run command
d

datajoely

01/11/2024, 12:58 PM
I guess a dirty solution here is that AWS batch will have environment variables that are only present online
and you could use that as your disabling logic
h

Hugo Evers

01/11/2024, 12:59 PM
thanks! there are indeed a bunch of workarounds. I saw there is a DISABLE_HOOKS_FOR_PLUGINS, that made me think there might be some hook activation logic i can hook into.
for example a list of hooks in the configloader, or kedrosession, or cli
d

datajoely

01/11/2024, 1:00 PM
unless your hook is included in a plug-in that wouldn’t work
but you could look at how
kedro-telemetry
works as a super simple plugin which can be disabled that way
h

Hugo Evers

01/11/2024, 1:01 PM
but do you know where the hooks are “stored”?
my guess would be the kedrosession, then ill start looking there
d

datajoely

01/11/2024, 1:03 PM
h

Hugo Evers

01/11/2024, 3:26 PM
okay, after a little bit of digging, the hooks are registrered in the KedroSession object, which is created in the run method of the cli. We can access the registrered hooks on the session object, and deregister hooks there. Also the env and other run params are available there.
Copy code
def run(
    tag,
    env,
    runner,
    is_async,
    node_names,
    to_nodes,
    from_nodes,
    from_inputs,
    to_outputs,
    load_version,
    pipeline,
    config,
    conf_source,
    params,
):

    ....
    with KedroSession.create(env=env, extra_params=params) as session:
        context = session.load_context()
        runner_instance = _instantiate_runner(runner, is_async, context)

        session._hook_manager.unregister() #<--------- Here we can unregister

        session.run(
            tags=tag,
            runner=runner_instance,
            node_names=node_names,
            from_nodes=from_nodes,
            to_nodes=to_nodes,
            from_inputs=from_inputs,
            to_outputs=to_outputs,
            load_versions=load_version,
            pipeline_name=PIPELINE_NAME,
        )
So im thinking the least dirty solution is to have the conditions detailing when to
unregister
which hooks in the settings.py file, and have a very minimal function in the CLI.py that executes this.
i can register this logic with the config_loader_class, and execute the logic in the
run
method of the CLI, that would be the most kedro-nic implementation right?
d

datajoely

01/11/2024, 3:54 PM
Yes - so this is a private method, so it will work and I’d like to flag this to the developers BUT the one caveat with private methods is if we change this in the future it won’t be counted as a breaking change.
h

Hugo Evers

01/11/2024, 3:56 PM
okay, i think id like to make the hookplugin public at some point if it proves to be usefull for my clients. but the mechanism for enabling and disabling it conditionally is something ill maintain (untill kedro implements a mechanism like a hooks.yaml that can de edited in the config/settings)
d

datajoely

01/11/2024, 3:58 PM
Yeah I’ve asked the question to the team, we’re still a bit thin on resources coming back from the holiday but I think it’s a really good point
if you have time a GitHub issue explaining your usecase would be invaluable
h

Hugo Evers

01/11/2024, 3:59 PM
okay, il finish this implementation first, and circle back. for now im going with a method on the configloader that returns a list of disabled hooks given the params passed to the run method
🚀 1
K 1
n

Nok Lam Chan

01/11/2024, 5:20 PM
if you don't want to touch the private method, could you just do a no-op conditional of the run env? (effectively unregistering the hook)
h

Hugo Evers

01/11/2024, 5:28 PM
my solution now looks like this: in CLI.py def run, after with KedroSession.create(env=env, extra_params=params) as session:
Copy code
run_args = extract_function_params(run, locals())

        for hook in context.config_loader.disable_hooks(run_args):
            session._hook_manager.unregister(name=hook._name_)
where
Copy code
def extract_function_params(func, local_vars):
    """
    Extracts the parameters of a given function based on its signature.
    Returns a copy of these parameters to ensure immutability.
    """
    param_names = func.callback.__code__.co_varnames[
        : func.callback.__code__.co_argcount
    ]
    return {
        param: copy.deepcopy(local_vars[param])
        for param in param_names
        if param in local_vars
    }
and in settings.py
Copy code
class OmegaConfigLoader(OmegaConfigLoader):
    def __init__(self, *args, **kwargs):
        kwargs["runtime_params"] = kwargs.get("runtime_params")
        super().__init__(*args, **kwargs)

    def disable_hooks(self, run_params: dict) -> list:
        """run_params are:
        tag,
        env,
        runner,
        is_async,
        node_names,
        to_nodes,
        from_nodes,
        from_inputs,
        to_outputs,
        load_version,
        pipeline,
        config,
        conf_source,
        params
        """
        disabled_hooks = []

        if run_params["env"] != "aws_batch":
            disabled_hooks.append(ErrorAnalysisHook)

        return disabled_hooks
one very obvious downside to this approach is that the hook will be initialised when the context gets created, so if the hook requires certain configs that are only accesible in the scenario you want the hook to run in, then you’ll have to deal with some try/except error handling which does make it a bit brittle (because there of course can be mis-configuration)
👍 1
d

datajoely

01/15/2024, 6:06 PM
thanks for posting your update
it’s a very good and sophisticated solution, but I’d like to make this easier in the future
thanks for being a Kedroid kedroid @Hugo Evers!
🙏 1