Hi all, what would be the best way to time-profile...
# questions
r
Hi all, what would be the best way to time-profile a kedro pipeline?
this should get you started
r
Thanks, Joel!
Would this work with spark datasets? I don't care about the dataset size, just run time
d
yeah so the point is you can start a timer in one hook and pick it up in another
r
Great. I will proceed to actually read it now
d
Hooks expose the lifecycle of a run so you can do lots of cool stuff with them
r
Awesome. I may need to simplify this to be able to run in databricks
n
you don't need the extra infrastructure for just profiling. same thing can be replaced by a simple dictionary
r
Makes sense. What would be the best way/hook to report out the run times after pipeline run?
n
Essentially, you record start/end for each node, and reporting that at the end of the pipeline using
after_pipeline_run
. For really simple case, it's fine to just print it out as a dictionary. If you want to take this further then maybe it's easier to make this a DataFrame so you can sort it easily, highlighting etc.