Hi all what would be the best way to time profile a kedro pi Kedro #questions

Join Slack

Hi all, what would be the best way to time-profile...

# questions

Richard Purvis

04/10/2024, 1:27 PM

Hi all, what would be the best way to time-profile a kedro pipeline?

datajoely

04/10/2024, 1:28 PM

https://docs.kedro.org/en/stable/hooks/examples.html#add-observability-to-your-pipeline

datajoely

04/10/2024, 1:28 PM

this should get you started

Richard Purvis

04/10/2024, 1:29 PM

Thanks, Joel!

Richard Purvis

04/10/2024, 1:30 PM

Would this work with spark datasets? I don't care about the dataset size, just run time

datajoely

04/10/2024, 1:30 PM

yeah so the point is you can start a timer in one hook and pick it up in another

Richard Purvis

04/10/2024, 1:30 PM

Great. I will proceed to actually read it now

datajoely

04/10/2024, 1:31 PM

Hooks expose the lifecycle of a run so you can do lots of cool stuff with them

Richard Purvis

04/10/2024, 1:38 PM

Awesome. I may need to simplify this to be able to run in databricks

Nok Lam Chan

04/10/2024, 1:42 PM

you don't need the extra infrastructure for just profiling. same thing can be replaced by a simple dictionary

Richard Purvis

04/10/2024, 1:44 PM

Makes sense. What would be the best way/hook to report out the run times after pipeline run?

Nok Lam Chan

04/10/2024, 1:47 PM

Essentially, you record start/end for each node, and reporting that at the end of the pipeline using

after_pipeline_run

. For really simple case, it's fine to just print it out as a dictionary. If you want to take this further then maybe it's easier to make this a DataFrame so you can sort it easily, highlighting etc.

109 Views

Open in Slack

Previous Next