Bibo Bobo
02/16/2025, 12:18 PMlog_table
method in kedro-mlflow. So I wonder what will be the right way to log additional data from a node, something that is not yet supported by the plugin?
Right now I just do something like this at the end of the node function
mlflow.log_table(data_for_table, output_filename)
But I am concerned as I am not sure if it will always work and will always log the data to the correct run because I was not able to get retrieve the active run id from inside the node with mlflow.active_run()
(it returns None
all the time).
I need this because I want to use the Evaluation
tab in the UI to manually compare some outputs of different runs.Hall
02/16/2025, 12:18 PMYolan Honoré-Rougé
02/16/2025, 12:59 PMYolan Honoré-Rougé
02/16/2025, 12:59 PMBibo Bobo
02/16/2025, 1:02 PMEvaluation
tab.
As far as I understand it should be logged via mlflow.log_table
method to appear in the datasets available for the Evaluation
tabYolan Honoré-Rougé
02/16/2025, 1:06 PMYolan Honoré-Rougé
02/16/2025, 1:07 PMBibo Bobo
02/16/2025, 1:10 PMEven if you use a JSON Dataset instead of a CSV one?That's a fair question, I tried to use
pandas.JSONDataset
(because I have data in DataFrame) with MlflowArtifactDataset
and it produced some stringified JSON as a result so it was not available in Evaluation either. Could you recommend which of the JSON datasets to try?Bibo Bobo
02/16/2025, 1:10 PMPlease open an issue in the repo and I'll try to add itSure, no problem
Bibo Bobo
02/16/2025, 1:26 PM{
"columns": list[column_names],
"data": list[list[values]]
}
whereas pandas converts a DataFrame into this format:
{
"[column_name]": list[values]
}
I can't check right now, but I'm almost sure this was the problem. So, one way to address it would be to manually convert a DataFrame into MLflow’s JSON format and then save it as you advised.Yolan Honoré-Rougé
02/16/2025, 1:32 PMdf.to_json(orient=...)
argument to specify how the conversion should be doneBibo Bobo
02/16/2025, 1:34 PMBibo Bobo
02/16/2025, 2:18 PMBibo Bobo
02/16/2025, 2:33 PMPhilipp Dahlke
02/17/2025, 12:32 AMMlflowHook
from kedro-mlfow, instatiated via mlflow.active_run
and are able to retrieve the node outputs with the after_node_run
method kedro provides.Bibo Bobo
02/17/2025, 12:47 PMmlflow.log_table
does the trick.
I just don't like this as a long term solution. So if by the time I have problems with it there will be no update in the plugin I will probably use a Hook or some other workaround