Hey I am currently profiling my Kedro code in order to find Kedro #questions

Hey, I am currently profiling my Kedro code in ord...

Konstantin Kobs

02/13/2024, 11:24 AM

Hey, I am currently profiling my Kedro code in order to find bottlenecks. I found that in the

_call_node_run

function, the node run itself took only about 5% of the overall function runtime (filtering some datasets), while the

after_node_run

hook took 95% of the runtime, even though I have no functions defined for this hook. I suspect that hooks that get the input and output data as parameters are adding a large overhead, since the

before_dataset_loaded

hook does not add any overhead. I think this might be a problem with Pluggy, but maybe someone here has an idea of how to reduce this overhead. Thanks!

datajoely

02/13/2024, 11:24 AM

can you share your profiling output?

Konstantin Kobs

02/13/2024, 11:36 AM

These are the profiling results for one node (filtering datasets) with the relevant kedro-internal functions that show that the node code takes only around 0.16s, while overall, the

run_node

function takes around 11s.

Untitled.py

datajoely

02/13/2024, 12:50 PM

Are you using any other plugins like

kedro-mlflow

this 1

Konstantin Kobs

02/13/2024, 1:29 PM

Not

kedro-mlflow

, but we have a Great Expectations plugin, which, however, I disabled. When I debug into the code, the hook manager does not have any plugins registered. Are there any "hidden" plugins that could be in there?

datajoely

02/13/2024, 1:30 PM

can you do a

pip freeze | grep kedro

to see if there are any other plug-ins installed?

Nok Lam Chan

02/13/2024, 1:42 PM

There isn't anything hidden, Kedro has first-party plugin but they are all just

pluggy

plugin so they should show up. It may be helpful if you can show all

debug

level logging since I think we have the hook trace enabled. For now I don't think this is a pluggy bug. Is 11 seconds long base on your understanding? The most significant time that a node spend on is :

Nok Lam Chan

02/13/2024, 1:42 PM

1. I/O (load and save the data, if you choose to persist the data on disk/remote storage) 2. Node function

4 Views

Open in Slack

Previous Next