Hi team I have been trying to play with hooks and followed y Kedro #questions

Hi team, I have been trying to play with hooks and...

Dustin

01/25/2023, 12:26 AM

Hi team, I have been trying to play with hooks and followed your doc to implement both memory profile and pipeline time hooks (just copied your scripts from doc) and registered in settings.py but no hook related information is shown in the console log with

kedro run

(no error but same console information without hooks). Just wondering do i need to do something to 'reload' settings? or

Dustin

01/25/2023, 12:29 AM

btw, there was a typo in one of your sample scripts which will raise error, i will submit a PR for that

👍 1

Deepyaman Datta

01/25/2023, 5:26 AM

Most likely it's running, but if you copied something like

MemoryProfilingHooks

it's logging output instead of printing to screen. Here's a dumb one I just tested that doesn't have any extra dependencies:

Copy code

from kedro.framework.hooks import hook_impl

import logging


def _normalise_mem_usage(mem_usage):
    # memory_profiler < 0.56.0 returns list instead of float
    return mem_usage[0] if isinstance(mem_usage, (list, tuple)) else mem_usage


class MemoryProfilingHooks:
    def __init__(self):
        self._mem_usage = {}

    @property
    def _logger(self):
        return logging.getLogger(self.__class__.__name__)

    @hook_impl
    def before_dataset_loaded(self, dataset_name: str) -> None:
        before_mem_usage = 99
        before_mem_usage = _normalise_mem_usage(before_mem_usage)
        self._mem_usage[dataset_name] = before_mem_usage

    @hook_impl
    def after_dataset_loaded(self, dataset_name: str) -> None:
        after_mem_usage = 1
        # memory_profiler < 0.56.0 returns list instead of float
        after_mem_usage = _normalise_mem_usage(after_mem_usage)

        print(  # Print to console
            "Loading %s consumed %2.2fMiB memory",
            dataset_name,
            after_mem_usage - self._mem_usage[dataset_name],
        )

Dustin

01/25/2023, 5:43 AM

just checked "logs->info.log" and no hook or MemoryProfilingHooks related info 👀

Dustin

01/25/2023, 5:47 AM

This is what in this hook and it is exactly the same sample code from your doc except that the previous typo in doc (already submitted a PR on this) which I had removed.

Copy code

import sys
from typing import Any, Dict

import statsd
from kedro.framework.hooks import hook_impl
from kedro.pipeline.node import Node

from memory_profiler import memory_usage
import logging


class PipelineMonitoringHooks:
    def __init__(self):
        self._timers = {}
        self._client = statsd.StatsClient(prefix="kedro")

    @hook_impl
    def before_node_run(self, node: Node) -> None:
        node_timer = self._client.timer(node.name)
        node_timer.start()
        self._timers[node.short_name] = node_timer

    @hook_impl
    def after_node_run(self, node: Node, inputs: Dict[str, Any]) -> None:
        self._timers[node.short_name].stop()
        for dataset_name, dataset_value in inputs.items():
            self._client.gauge(dataset_name + "_size", sys.getsizeof(dataset_value))

    @hook_impl
    def after_pipeline_run(self):
        self._client.incr("run")


def _normalise_mem_usage(mem_usage):
    # memory_profiler < 0.56.0 returns list instead of float
    return mem_usage[0] if isinstance(mem_usage, (list, tuple)) else mem_usage


class MemoryProfilingHooks:
    def __init__(self):
        self._mem_usage = {}

    @property
    def _logger(self):
        return logging.getLogger(self.__class__.__name__)

    @hook_impl
    def before_dataset_loaded(self, dataset_name: str) -> None:
        before_mem_usage = memory_usage(
            -1,
            interval=0.1,
            max_usage=True,
            retval=True,
            include_children=True,
        )
        before_mem_usage = _normalise_mem_usage(before_mem_usage)
        self._mem_usage[dataset_name] = before_mem_usage

    @hook_impl
    def after_dataset_loaded(self, dataset_name: str) -> None:
        after_mem_usage = memory_usage(
            -1,
            interval=0.1,
            max_usage=True,
            retval=True,
            include_children=True,
        )
        # memory_profiler < 0.56.0 returns list instead of float
        after_mem_usage = _normalise_mem_usage(after_mem_usage)

        <http://self._logger.info|self._logger.info>(
            "Loading %s consumed %2.2fMiB memory",
            dataset_name,
            after_mem_usage - self._mem_usage[dataset_name],
        )

Deepyaman Datta

01/25/2023, 6:07 AM

Can you type

breakpoint()

in the body of, say, the

before_dataset_loaded

hook?

Dustin

01/25/2023, 6:15 AM

sorry, you meant body of it

Antony Milne

01/25/2023, 10:18 AM

Agree with @Deepyaman Datta that the hooks are indeed running, but just not being logged (to either screen or file). The best fix would be this: 1. Completely delete the

def _logger

2. Change

self._<http://logger.info|logger.info>

logging.getLogger(__name__).info

Antony Milne

01/25/2023, 10:18 AM

Thank you for raising a PR to fix the docs. The quoted output from

kedro run

will look a bit different now, so if you don’t mind updating that at the same time that would be amazing, thank you! 🙏

Dustin

01/25/2023, 10:42 AM

Thanks for your help, this is to confirm it is now shown in BOTH console log and logs (don't mind it shows in both 🙂 ) by implementing your solution.

🥳 1

Dustin

01/25/2023, 10:47 AM

Two questions, 1. do you want me to raise PR to update the sample code with your solution? (I am more than happy to do so) 2. Regarding your second comment "`kedro run`` will look different and would prefer an update as well -> I am not sure what you are referring to but I assume it is referring to the console log and it looks like the following with your solution. If that's the case, happy to do this update as well and will mention you in the PR so you can communicate internally. Please let me your thoughts on 1&2, cheers.

Copy code

[01/25/23 21:38:23] INFO     Loading data from 'example_iris_data' (CSVDataSet)...                                                                                                                                                                                    data_catalog.py:343
                    INFO     Loading example_iris_data consumed 0.99MiB memory                                                                                                                                                                                                hooks.py:67
                    INFO     Loading data from 'parameters' (MemoryDataSet)...                                                                                                                                                                                        data_catalog.py:343
                    INFO     Loading parameters consumed 0.48MiB memory                                                                                                                                                                                                       hooks.py:67
                    INFO     Running node: split: split_data([example_iris_data,parameters]) -> [X_train,X_test,y_train,y_test]                                                                                                                                               node.py:327
                    INFO     Saving data to 'X_train' (MemoryDataSet)...                                                                                                                                                                                              data_catalog.py:382
                    INFO     Saving data to 'X_test' (MemoryDataSet)...                                                                                                                                                                                               data_catalog.py:382
                    INFO     Saving data to 'y_train' (MemoryDataSet)...                                                                                                                                                                                              data_catalog.py:382
                    INFO     Saving data to 'y_test' (MemoryDataSet)...                                                                                                                                                                                               data_catalog.py:382
                    INFO     Completed 1 out of 3 tasks                                                                                                                                                                                                           sequential_runner.py:85
                    INFO     Loading data from 'X_train' (MemoryDataSet)...                                                                                                                                                                                           data_catalog.py:343
                    INFO     Loading X_train consumed 0.49MiB memory                                                                                                                                                                                                          hooks.py:67
                    INFO     Loading data from 'X_test' (MemoryDataSet)...

Antony Milne

01/25/2023, 11:04 AM

Good to hear, thanks for the update @Dustin. 1. Yes please! 2. That’s exactly right - the console log formatting is just a bit different now compared to the log given in the docs at the moment, so if you don’t mind updating that at the same time it would be very helpful.

Dustin

01/25/2023, 11:05 AM

on both 1&2 now and will mention you in PR

👍 1

Dustin

01/25/2023, 11:25 AM

https://github.com/kedro-org/kedro/pull/2257 @Antony Milne

Jo Stichbury

01/25/2023, 2:14 PM

Thanks @Dustin for the above PR (and the one you made last week!). We appreciate your help in tracking down those documentation gremlins 👾

🙏 1

👍 1

24 Views

Open in Slack

Previous Next