https://kedro.org/ logo
#questions
Title
# questions
k

Konstantin Kobs

02/13/2024, 11:24 AM
Hey, I am currently profiling my Kedro code in order to find bottlenecks. I found that in the
_call_node_run
function, the node run itself took only about 5% of the overall function runtime (filtering some datasets), while the
after_node_run
hook took 95% of the runtime, even though I have no functions defined for this hook. I suspect that hooks that get the input and output data as parameters are adding a large overhead, since the
before_dataset_loaded
hook does not add any overhead. I think this might be a problem with Pluggy, but maybe someone here has an idea of how to reduce this overhead. Thanks!
d

datajoely

02/13/2024, 11:24 AM
can you share your profiling output?
k

Konstantin Kobs

02/13/2024, 11:36 AM
These are the profiling results for one node (filtering datasets) with the relevant kedro-internal functions that show that the node code takes only around 0.16s, while overall, the
run_node
function takes around 11s.
d

datajoely

02/13/2024, 12:50 PM
Are you using any other plugins like
kedro-mlflow
?
this 1
k

Konstantin Kobs

02/13/2024, 1:29 PM
Not
kedro-mlflow
, but we have a Great Expectations plugin, which, however, I disabled. When I debug into the code, the hook manager does not have any plugins registered. Are there any "hidden" plugins that could be in there?
d

datajoely

02/13/2024, 1:30 PM
can you do a
pip freeze | grep kedro
to see if there are any other plug-ins installed?
n

Nok Lam Chan

02/13/2024, 1:42 PM
There isn't anything hidden, Kedro has first-party plugin but they are all just
pluggy
plugin so they should show up. It may be helpful if you can show all
debug
level logging since I think we have the hook trace enabled. For now I don't think this is a pluggy bug. Is 11 seconds long base on your understanding? The most significant time that a node spend on is :
1. I/O (load and save the data, if you choose to persist the data on disk/remote storage) 2. Node function