Hi all, I am facing an problem for which I know an...
# questions
c
Hi all, I am facing an problem for which I know an work-around but may want to have a clean solution. I am using the kedro-mlflow integration to track my metrics in mlflow. I calculate these in a function, lets call it calculate metrics. My issue is that I use this function for validation set and test set as both are calculated similar. Now I want to save them individually as metric_model_validation or metric_model_test. Of course I can make it as an modular pipeline with different input parameters, but I think it is a bit overengineered to do it this way. Because all I need is a fixed variable that says if the function is used for test or validation to add it to the metric name. In my solution, the params would just include: metric_calculation: case: validation I hope asking this is appropiate as there is already a solution. But maybe somebody knows a way to solve this issue better.
n
In this case can you have only 1 datast which is
metric_model
? And maybe then this parameter is pass to the metadata of the
metric_model
something like
Copy code
# catalog.yml
MetricDataSet:
  path: xxxx/xxxxx/{case}/metric.json
c
Maybe, but I currently track the metric via log_metric and not in a dataset. The easiest way would be to directly pass a parameter to the node, but I think it is not in accordance with the kedro data handling concept.
n
IMO, it’s perfectly fine. Kedro has a data-centric flow but it’s not end of the world to not follow it. Similarly, you don’t have a LogDataSet but you simply do
<http://logger.info|logger.info>
. The only reason why it is required because Kedro need to figure out the dependencies via nodes’ input/output. If there are no downstream node consume that data, it is perfectly fine. An alternative here is using
hook
to do this.