hello kedro team :blush: we have difficulty settin...
# plugins-integrations
l
hello kedro team 😊 we have difficulty setting the
long_params_strategy
parameter to
tag
(kedro-mlflow==0.11.10 , kedro~=0.18.9) As you can see, in our set-up, instead of going to set_tag, the parameter is sent to be logged as a parameter. Do you have any idea as to why this is occurring? Thank you in advance â˜ș
m
The code literally shows the logic. What is your expected output? The
long_params_strategy
only applies if the parameters don't fit within the
MAX_PARAM_VAL_LENGTH
y
Can you explain exactly what you are trying to do, and share your
mlflow.yml
? Kedro-mlflow
long_params_strategy
only applies if logging as a parameter would raise an error (mostly because some backend do not support long string). If you want to "force" not logging an input as a parameter you can create an arbitrary "YAMLDataset" in your catalog and move your parameters to this dataset. If your long parameter is a
dict
, you can use
flatten_dict
key to split it into several small parameters
l
Hello @marrrcin and @Yolan HonorĂ©-RougĂ©, thank you for your inputs. 😊 The parameter we are trying to log is a long list of columns to extract from an SQL table for data processing. It does exceeds the
MAX_PARAM_VAL_LENGTH
as the call to
mlflow.log_param
returns an error message:
INVALID_PARAMETER_VALUE Param value ... had length 620, which exceeded length limit of 500
Our expected output would be for this parameter be logged as a
tag
as we have specified the
long_parameter_strategy
. However, it gets sent to mlflow to be logged as a parameter, hence the error message.
đŸ€š 1
and here is our mlflow.yaml (with specific project info removed)
y
This is definitely a very weird and interesting bug! Can you share some extra info : ‱ mlflow version ‱ mlflow setup (do you use a local database or the filesystem or a remote server? What does your mlflow tracking uri start with? file:///, sqlite:///, https:///? ) ‱ what your SQL requests look like in
parameters.yml
? a multiline string with triple quotes or a "normal" raw string ? ‱ Can you create a minimal reproducible example with e.g. pandas-spaceflights starter? ‱ Can you try to set a breakpoint / print statement in the hook to show
str_value_length
and
MAX_PARAM_VAL_LENGTH
?
gotcha: https://github.com/mlflow/mlflow/blob/9b9357efdbf183fa6c5bb2432a31f24b724e05d4/mlflow/utils/validation.py#L53 For some reason they have updated the value to 6000, but it seems to still be failing on mlflow's end... Can you open up a bug in the mlflow repo? I'll hardcode the value in kedro-mlflow meanwhile > https://github.com/mlflow/mlflow/blob/9b9357efdbf183fa6c5bb2432a31f24b724e05d4/docs/source/tracking/backend-stores.rst#supported-store-types > Note > > In Sep 2023, we increased the max length for params recorded in a Run from 500 to 8k (but we limit param value max length to 6000 internally). mlflow/2d6e25af4d3e_increase_max_param_val_length is a non-invertible migration script that increases the cap in existing database to 8k . Please be careful if you want to upgrade and backup your database before upgrading.
đŸ„ł 1
I suspect a mismatch between your local mlflow version (which supports up to 6000 parameters length) and the server one (which is limited to 500 because it is an older version). I should eventually let people configure this because it's the second time someone complains about this ; on the other hand, it's been 3 years and only 2 people have complained 😉
đŸ„ł 1
The only workaround I envison is to store your sql query in a yaml / text file and consider it a dataset instead of a param. You can make it a
MLflowArtifactDataset
if you want to ensure logging in mlflow. I accept PR if you want a fix be released fast!
đŸ„ł 1
l
amazing, thanks so much for looking into it !! ⭐ I will see with my team if we go for MLflowArtifactDataset or if we go for a PR