Hello guys I noticed that there is no support for `log table Kedro #plugins-integrations

Hello, guys, I noticed that there is no support fo...

Bibo Bobo

02/16/2025, 12:18 PM

Hello, guys, I noticed that there is no support for

log_table

method in kedro-mlflow. So I wonder what will be the right way to log additional data from a node, something that is not yet supported by the plugin? Right now I just do something like this at the end of the node function

Copy code

mlflow.log_table(data_for_table, output_filename)

But I am concerned as I am not sure if it will always work and will always log the data to the correct run because I was not able to get retrieve the active run id from inside the node with

mlflow.active_run()

(it returns

None

all the time). I need this because I want to use the

Evaluation

tab in the UI to manually compare some outputs of different runs.

Hall

02/16/2025, 12:18 PM

Someone will reply to you shortly. In the meantime, this might help:

Yolan Honoré-Rougé

02/16/2025, 12:59 PM

You can just just return your table at the end of the node, and use a MlflowArtifactDataset combined with a CSVDataset in your catalog

Yolan Honoré-Rougé

02/16/2025, 12:59 PM

It will be logged automatically

Yolan Honoré-Rougé

02/16/2025, 1:01 PM

https://kedro-mlflow.readthedocs.io/en/stable/source/03_experiment_tracking/01_experiment_tracking/03_version_datasets.html#how-to-track-data-in-a-kedro-project

Bibo Bobo

02/16/2025, 1:02 PM

It won't work. I mean it will log the artifact for sure but it will not be accessible in

Evaluation

tab. As far as I understand it should be logged via

mlflow.log_table

method to appear in the datasets available for the

Evaluation

tab

Yolan Honoré-Rougé

02/16/2025, 1:06 PM

Even if you use a JSON Dataset instead of a CSV one?

Yolan Honoré-Rougé

02/16/2025, 1:07 PM

But you are right , there no support for log table right now. Please open an issue in the repo and I'll try to add it : https://github.com/Galileo-Galilei/kedro-mlflow

Bibo Bobo

02/16/2025, 1:10 PM

Even if you use a JSON Dataset instead of a CSV one?

That's a fair question, I tried to use

pandas.JSONDataset

(because I have data in DataFrame) with

MlflowArtifactDataset

and it produced some stringified JSON as a result so it was not available in Evaluation either. Could you recommend which of the JSON datasets to try?

Bibo Bobo

02/16/2025, 1:10 PM

Please open an issue in the repo and I'll try to add it

Sure, no problem

Bibo Bobo

02/16/2025, 1:26 PM

Actually, maybe the JSON wasn't stringified. It might have had a different format because MLflow uses something like:

Copy code

{
  "columns": list[column_names], 
  "data": list[list[values]]
}

whereas pandas converts a DataFrame into this format:

Copy code

{
  "[column_name]": list[values]
}

I can't check right now, but I'm almost sure this was the problem. So, one way to address it would be to manually convert a DataFrame into MLflow’s JSON format and then save it as you advised.

Yolan Honoré-Rougé

02/16/2025, 1:32 PM

I think it should. If I remember correctly there is a

df.to_json(orient=...)

argument to specify how the conversion should be done

Bibo Bobo

02/16/2025, 1:34 PM

Yeah, this might work. Let me check

Bibo Bobo

02/16/2025, 2:18 PM

Nope, it doesn't work 😔 I will create a feature request in the repo

Bibo Bobo

02/16/2025, 2:33 PM

@Yolan Honoré-Rougé FYI, I've created a feature request https://github.com/Galileo-Galilei/kedro-mlflow/issues/634

👍 1

Philipp Dahlke

02/17/2025, 12:32 AM

How about using a hook with the mlflow library? Thats what I do atm. You will have access to the current run which the

MlflowHook

from kedro-mlfow, instatiated via

mlflow.active_run

and are able to retrieve the node outputs with the

after_node_run

method kedro provides.

Bibo Bobo

02/17/2025, 12:47 PM

@Philipp Dahlke Yeah, thank you, I think it makes sense too. I just thought that it is probably and overkill for now since the simple call to

mlflow.log_table

does the trick. I just don't like this as a long term solution. So if by the time I have problems with it there will be no update in the plugin I will probably use a Hook or some other workaround

6 Views

Open in Slack

Previous Next