Hi I would like to be able to track certain parameters of my Kedro #questions

Hi! I would like to be able to track certain param...

Jan

06/28/2024, 8:01 AM

Hi! I would like to be able to track certain parameters of my pipeline with kedro experiment tracking. I want them always to be tracked with each run, even if I just run a subset of the complete pipeline. What would be the best approach to do this? If I understand correctly we need to run a node "track_parameters" that tracks those parameters and saves them to a metrics dataset every time. I thought about hooks but I don't know how to modify the pipeline in the before_pipeline_run hook so that the node "track_parameters" is added if not present yet.

👀 1

Elena Khaustova

06/28/2024, 8:17 AM

Hi Jan, your case is the common application of hooks. You can use them to inject additional behaviour at certain lifecycle points in Kedro’s main execution, so you do not need to create any additional nodes, and a pipeline doesn’t require any modifications. Please see the execution order here: https://docs.kedro.org/en/stable/hooks/introduction.html You can use

after_node_run

hook to log your metrics upon node/nodes execution as in the example: https://docs.kedro.org/en/stable/hooks/examples.html#add-metrics-tracking-to-your-model

Jan

06/28/2024, 8:24 AM

Hi Elena, thanks for the hint. How exactly could I then create the tracked metric for kedro experiment tracking? I tried with kedro-mlflow and that works well for tracking the parameters but they are then tracked via mlflow. However, in mlflow I can not directly compare artifacts (i.e. plots). Thus, I would like to use kedro viz to compare two runs directly, showing the plots and the used parameters. What I am not sure about is how to log the metrics (to kedro, not to mlflow) to the corresponding current run during the hook execution?

Elena Khaustova

06/28/2024, 8:41 AM

To enable experiment tracking with Kedro-Viz you should: • Set up a session store to capture experiment metadata • Set up experiment tracking datasets to list the metrics to track • Modify your nodes and pipelines to output those metrics Here is the description of the above steps with an eaxample: https://docs.kedro.org/projects/kedro-viz/en/stable/experiment_tracking.html#when-should-i-use-experiment-tracking-in-kedro

Jan

06/28/2024, 8:46 AM

Thanks, I did setup the experiment tracking with kedro viz already. The culprit for me is to find how to track the parameters as a metric via a hook. If I just create a node that tracks the parameters it is not guaranteed that this one will run each time I run a certain pipeline.

Elena Khaustova

06/28/2024, 9:10 AM

You can then move experiment tracking nodes into a separate pipeline and then run it with the subset of the target pipeline, aka

kedro run experiment_traking_pipleline + target_subset

Edit: the syntax above is not possible - it’s just to give an idea of splitting the pipeline. But you can sum pipeline objects, see an example below.

Elena Khaustova

06/28/2024, 9:12 AM

Otherwise, you probably will need to modify the data catalog in the

after_node_run

hook to save your metrics rather than make it in a separate node

Jan

06/28/2024, 9:54 AM

Thanks, would the first option still be run if I use a filter like

--from-nodes

? For the second option I don't think this is possible. Or will modifications to the catalog object be reflected in the rest of the execution? Because the return type of the function is None?

Elena Khaustova

06/28/2024, 10:25 AM

1. You can use

--from-nodes

in case your metrics tracking nodes remain in the pipeline after the slice. You do not need to split your pipeline for this. 2. The solution that I mean is to split your pipeline into two:

Copy code

custom_pipeline = (
    experiment_traking_pipleline() + main_pipeline()
)

To further filter nodes you can apply tags: https://docs.kedro.org/en/stable/nodes_and_pipelines/nodes.html#how-to-tag-a-node So you run

kedro run -p custom_pipeline -t tracking, tag_a

Elena Khaustova

06/28/2024, 10:28 AM

Catalog modification at runtime is possible but it’s not straightforward and we do not recommend this, here is what you will need to do in case you decide to follow this option: https://kedro-org.slack.com/archives/C03RKP2LW64/p1719401040476289?thread_ts=1719389062.377319&cid=C03RKP2LW64

Jan

06/28/2024, 1:17 PM

Alright, understood. Used a tag indeed but the node was thrown out after the

--from-nodes

filter. Putting that one in another pipeline will probably fix that indeed. Thanks a lot for the support 🙂

🚀 1

👍 1

5 Views

Open in Slack

Previous Next