Hi team, I have been trying to play with hooks and...
# questions
d
Hi team, I have been trying to play with hooks and followed your doc to implement both memory profile and pipeline time hooks (just copied your scripts from doc) and registered in settings.py but no hook related information is shown in the console log with
kedro run
(no error but same console information without hooks). Just wondering do i need to do something to 'reload' settings? or
btw, there was a typo in one of your sample scripts which will raise error, i will submit a PR for that
๐Ÿ‘ 1
d
Most likely it's running, but if you copied something like
MemoryProfilingHooks
it's logging output instead of printing to screen. Here's a dumb one I just tested that doesn't have any extra dependencies:
Copy code
from kedro.framework.hooks import hook_impl

import logging


def _normalise_mem_usage(mem_usage):
    # memory_profiler < 0.56.0 returns list instead of float
    return mem_usage[0] if isinstance(mem_usage, (list, tuple)) else mem_usage


class MemoryProfilingHooks:
    def __init__(self):
        self._mem_usage = {}

    @property
    def _logger(self):
        return logging.getLogger(self.__class__.__name__)

    @hook_impl
    def before_dataset_loaded(self, dataset_name: str) -> None:
        before_mem_usage = 99
        before_mem_usage = _normalise_mem_usage(before_mem_usage)
        self._mem_usage[dataset_name] = before_mem_usage

    @hook_impl
    def after_dataset_loaded(self, dataset_name: str) -> None:
        after_mem_usage = 1
        # memory_profiler < 0.56.0 returns list instead of float
        after_mem_usage = _normalise_mem_usage(after_mem_usage)

        print(  # Print to console
            "Loading %s consumed %2.2fMiB memory",
            dataset_name,
            after_mem_usage - self._mem_usage[dataset_name],
        )
d
just checked "logs->info.log" and no hook or MemoryProfilingHooks related info ๐Ÿ‘€
This is what in this hook and it is exactly the same sample code from your doc except that the previous typo in doc (already submitted a PR on this) which I had removed.
Copy code
import sys
from typing import Any, Dict

import statsd
from kedro.framework.hooks import hook_impl
from kedro.pipeline.node import Node

from memory_profiler import memory_usage
import logging


class PipelineMonitoringHooks:
    def __init__(self):
        self._timers = {}
        self._client = statsd.StatsClient(prefix="kedro")

    @hook_impl
    def before_node_run(self, node: Node) -> None:
        node_timer = self._client.timer(node.name)
        node_timer.start()
        self._timers[node.short_name] = node_timer

    @hook_impl
    def after_node_run(self, node: Node, inputs: Dict[str, Any]) -> None:
        self._timers[node.short_name].stop()
        for dataset_name, dataset_value in inputs.items():
            self._client.gauge(dataset_name + "_size", sys.getsizeof(dataset_value))

    @hook_impl
    def after_pipeline_run(self):
        self._client.incr("run")


def _normalise_mem_usage(mem_usage):
    # memory_profiler < 0.56.0 returns list instead of float
    return mem_usage[0] if isinstance(mem_usage, (list, tuple)) else mem_usage


class MemoryProfilingHooks:
    def __init__(self):
        self._mem_usage = {}

    @property
    def _logger(self):
        return logging.getLogger(self.__class__.__name__)

    @hook_impl
    def before_dataset_loaded(self, dataset_name: str) -> None:
        before_mem_usage = memory_usage(
            -1,
            interval=0.1,
            max_usage=True,
            retval=True,
            include_children=True,
        )
        before_mem_usage = _normalise_mem_usage(before_mem_usage)
        self._mem_usage[dataset_name] = before_mem_usage

    @hook_impl
    def after_dataset_loaded(self, dataset_name: str) -> None:
        after_mem_usage = memory_usage(
            -1,
            interval=0.1,
            max_usage=True,
            retval=True,
            include_children=True,
        )
        # memory_profiler < 0.56.0 returns list instead of float
        after_mem_usage = _normalise_mem_usage(after_mem_usage)

        <http://self._logger.info|self._logger.info>(
            "Loading %s consumed %2.2fMiB memory",
            dataset_name,
            after_mem_usage - self._mem_usage[dataset_name],
        )
d
Can you type
breakpoint()
in the body of, say, the
before_dataset_loaded
hook?
d
sorry, you meant body of it
a
Agree with @Deepyaman Datta that the hooks are indeed running, but just not being logged (to either screen or file). The best fix would be this: 1. Completely delete the
def _logger
2. Change
self._<http://logger.info|logger.info>
to
logging.getLogger(__name__).info
Thank you for raising a PR to fix the docs. The quoted output from
kedro run
will look a bit different now, so if you donโ€™t mind updating that at the same time that would be amazing, thank you! ๐Ÿ™
d
Thanks for your help, this is to confirm it is now shown in BOTH console log and logs (don't mind it shows in both ๐Ÿ™‚ ) by implementing your solution.
๐Ÿฅณ 1
Two questions, 1. do you want me to raise PR to update the sample code with your solution? (I am more than happy to do so) 2. Regarding your second comment "`kedro run`` will look different and would prefer an update as well -> I am not sure what you are referring to but I assume it is referring to the console log and it looks like the following with your solution. If that's the case, happy to do this update as well and will mention you in the PR so you can communicate internally. Please let me your thoughts on 1&2, cheers.
Copy code
[01/25/23 21:38:23] INFO     Loading data from 'example_iris_data' (CSVDataSet)...                                                                                                                                                                                    data_catalog.py:343
                    INFO     Loading example_iris_data consumed 0.99MiB memory                                                                                                                                                                                                hooks.py:67
                    INFO     Loading data from 'parameters' (MemoryDataSet)...                                                                                                                                                                                        data_catalog.py:343
                    INFO     Loading parameters consumed 0.48MiB memory                                                                                                                                                                                                       hooks.py:67
                    INFO     Running node: split: split_data([example_iris_data,parameters]) -> [X_train,X_test,y_train,y_test]                                                                                                                                               node.py:327
                    INFO     Saving data to 'X_train' (MemoryDataSet)...                                                                                                                                                                                              data_catalog.py:382
                    INFO     Saving data to 'X_test' (MemoryDataSet)...                                                                                                                                                                                               data_catalog.py:382
                    INFO     Saving data to 'y_train' (MemoryDataSet)...                                                                                                                                                                                              data_catalog.py:382
                    INFO     Saving data to 'y_test' (MemoryDataSet)...                                                                                                                                                                                               data_catalog.py:382
                    INFO     Completed 1 out of 3 tasks                                                                                                                                                                                                           sequential_runner.py:85
                    INFO     Loading data from 'X_train' (MemoryDataSet)...                                                                                                                                                                                           data_catalog.py:343
                    INFO     Loading X_train consumed 0.49MiB memory                                                                                                                                                                                                          hooks.py:67
                    INFO     Loading data from 'X_test' (MemoryDataSet)...
a
Good to hear, thanks for the update @Dustin. 1. Yes please! 2. Thatโ€™s exactly right - the console log formatting is just a bit different now compared to the log given in the docs at the moment, so if you donโ€™t mind updating that at the same time it would be very helpful.
d
on both 1&2 now and will mention you in PR
๐Ÿ‘ 1
j
Thanks @Dustin for the above PR (and the one you made last week!). We appreciate your help in tracking down those documentation gremlins ๐Ÿ‘พ
๐Ÿ‘ 1
๐Ÿ™ 1