Is it possible to modify which hooks are active based on the Kedro #questions

Is it possible to modify which hooks are active ba...

Hugo Evers

01/11/2024, 12:53 PM

Is it possible to modify which hooks are active based on the env? so my hook is only active when

kedro run --env=aws_batch

but on other envs? I am looking at implementing a mechanism for this in the settings.py, but i dont know how to modify the active hooks in the config loader class, the registered hooks dont seem to be accessible there.

datajoely

01/11/2024, 12:54 PM

so I actually think this happens too late in the lifecycle, but what you could do is use the

KEDRO_ENV

environment variable way of setting envs and pick that up in the hook?

Hugo Evers

01/11/2024, 12:55 PM

yeah, but then the hook is responsible for picking up when to be active, which would be counterintuitive

Hugo Evers

01/11/2024, 12:56 PM

basically, i want to log errors in pipelines when running on aws batch, send the logs and traceback to chatgpt, and then send chatgpt’s analysis to a slack channel

Hugo Evers

01/11/2024, 12:57 PM

but, i dont want that to happen everytime i run a pipeline locally, only when i specify. and there could be different scenarios for when i want to disable the hook. id rather just change the HOOKS variable in the settings.py then accomodate for every scenario in the hook

Hugo Evers

01/11/2024, 12:58 PM

also because i use aws batch for deployment, i dont use a .env file or env variables, everything is set in the kedro run command

datajoely

01/11/2024, 12:58 PM

I guess a dirty solution here is that AWS batch will have environment variables that are only present online

datajoely

01/11/2024, 12:58 PM

and you could use that as your disabling logic

Hugo Evers

01/11/2024, 12:59 PM

thanks! there are indeed a bunch of workarounds. I saw there is a DISABLE_HOOKS_FOR_PLUGINS, that made me think there might be some hook activation logic i can hook into.

Hugo Evers

01/11/2024, 1:00 PM

for example a list of hooks in the configloader, or kedrosession, or cli

datajoely

01/11/2024, 1:00 PM

unless your hook is included in a plug-in that wouldn’t work

datajoely

01/11/2024, 1:00 PM

but you could look at how

kedro-telemetry

works as a super simple plugin which can be disabled that way

Hugo Evers

01/11/2024, 1:01 PM

but do you know where the hooks are “stored”?

Hugo Evers

01/11/2024, 1:01 PM

my guess would be the kedrosession, then ill start looking there

datajoely

01/11/2024, 1:03 PM

hooks are registered via the entrypoints described here https://docs.kedro.org/en/latest/extend_kedro/plugins.html#example-of-a-simple-plugin

Hugo Evers

01/11/2024, 3:26 PM

okay, after a little bit of digging, the hooks are registrered in the KedroSession object, which is created in the run method of the cli. We can access the registrered hooks on the session object, and deregister hooks there. Also the env and other run params are available there.

Copy code

def run(
    tag,
    env,
    runner,
    is_async,
    node_names,
    to_nodes,
    from_nodes,
    from_inputs,
    to_outputs,
    load_version,
    pipeline,
    config,
    conf_source,
    params,
):

    ....
    with KedroSession.create(env=env, extra_params=params) as session:
        context = session.load_context()
        runner_instance = _instantiate_runner(runner, is_async, context)

        session._hook_manager.unregister() #<--------- Here we can unregister

        session.run(
            tags=tag,
            runner=runner_instance,
            node_names=node_names,
            from_nodes=from_nodes,
            to_nodes=to_nodes,
            from_inputs=from_inputs,
            to_outputs=to_outputs,
            load_versions=load_version,
            pipeline_name=PIPELINE_NAME,
        )

Hugo Evers

01/11/2024, 3:29 PM

So im thinking the least dirty solution is to have the conditions detailing when to

unregister

which hooks in the settings.py file, and have a very minimal function in the CLI.py that executes this.

Hugo Evers

01/11/2024, 3:31 PM

i can register this logic with the config_loader_class, and execute the logic in the

run

method of the CLI, that would be the most kedro-nic implementation right?

datajoely

01/11/2024, 3:54 PM

Yes - so this is a private method, so it will work and I’d like to flag this to the developers BUT the one caveat with private methods is if we change this in the future it won’t be counted as a breaking change.

Hugo Evers

01/11/2024, 3:56 PM

okay, i think id like to make the hookplugin public at some point if it proves to be usefull for my clients. but the mechanism for enabling and disabling it conditionally is something ill maintain (untill kedro implements a mechanism like a hooks.yaml that can de edited in the config/settings)

datajoely

01/11/2024, 3:58 PM

Yeah I’ve asked the question to the team, we’re still a bit thin on resources coming back from the holiday but I think it’s a really good point

datajoely

01/11/2024, 3:58 PM

if you have time a GitHub issue explaining your usecase would be invaluable

Hugo Evers

01/11/2024, 3:59 PM

okay, il finish this implementation first, and circle back. for now im going with a method on the configloader that returns a list of disabled hooks given the params passed to the run method

🚀 1

K 1

Nok Lam Chan

01/11/2024, 5:20 PM

if you don't want to touch the private method, could you just do a no-op conditional of the run env? (effectively unregistering the hook)

Hugo Evers

01/11/2024, 5:28 PM

my solution now looks like this: in CLI.py def run, after with KedroSession.create(env=env, extra_params=params) as session:

Copy code

run_args = extract_function_params(run, locals())

        for hook in context.config_loader.disable_hooks(run_args):
            session._hook_manager.unregister(name=hook._name_)

where

Copy code

def extract_function_params(func, local_vars):
    """
    Extracts the parameters of a given function based on its signature.
    Returns a copy of these parameters to ensure immutability.
    """
    param_names = func.callback.__code__.co_varnames[
        : func.callback.__code__.co_argcount
    ]
    return {
        param: copy.deepcopy(local_vars[param])
        for param in param_names
        if param in local_vars
    }

and in settings.py

Copy code

class OmegaConfigLoader(OmegaConfigLoader):
    def __init__(self, *args, **kwargs):
        kwargs["runtime_params"] = kwargs.get("runtime_params")
        super().__init__(*args, **kwargs)

    def disable_hooks(self, run_params: dict) -> list:
        """run_params are:
        tag,
        env,
        runner,
        is_async,
        node_names,
        to_nodes,
        from_nodes,
        from_inputs,
        to_outputs,
        load_version,
        pipeline,
        config,
        conf_source,
        params
        """
        disabled_hooks = []

        if run_params["env"] != "aws_batch":
            disabled_hooks.append(ErrorAnalysisHook)

        return disabled_hooks

Hugo Evers

01/15/2024, 5:08 PM

one very obvious downside to this approach is that the hook will be initialised when the context gets created, so if the hook requires certain configs that are only accesible in the scenario you want the hook to run in, then you’ll have to deal with some try/except error handling which does make it a bit brittle (because there of course can be mis-configuration)

👍 1

datajoely

01/15/2024, 6:06 PM

thanks for posting your update

datajoely

01/15/2024, 6:07 PM

it’s a very good and sophisticated solution, but I’d like to make this easier in the future

datajoely

01/15/2024, 6:07 PM

thanks for being a Kedroid kedroid @Hugo Evers!

🙏 1

2 Views

Open in Slack

Previous Next