hello kedro team :blush: we have difficulty settin...
# plugins-integrations
hello kedro team 😊 we have difficulty setting the
parameter to
(kedro-mlflow==0.11.10 , kedro~=0.18.9) As you can see, in our set-up, instead of going to set_tag, the parameter is sent to be logged as a parameter. Do you have any idea as to why this is occurring? Thank you in advance â˜ș
The code literally shows the logic. What is your expected output? The
only applies if the parameters don't fit within the
Can you explain exactly what you are trying to do, and share your
? Kedro-mlflow
only applies if logging as a parameter would raise an error (mostly because some backend do not support long string). If you want to "force" not logging an input as a parameter you can create an arbitrary "YAMLDataset" in your catalog and move your parameters to this dataset. If your long parameter is a
, you can use
key to split it into several small parameters
Hello @marrrcin and @Yolan HonorĂ©-RougĂ©, thank you for your inputs. 😊 The parameter we are trying to log is a long list of columns to extract from an SQL table for data processing. It does exceeds the
as the call to
returns an error message:
INVALID_PARAMETER_VALUE Param value ... had length 620, which exceeded length limit of 500
Our expected output would be for this parameter be logged as a
as we have specified the
. However, it gets sent to mlflow to be logged as a parameter, hence the error message.
đŸ€š 1
and here is our mlflow.yaml (with specific project info removed)
This is definitely a very weird and interesting bug! Can you share some extra info : ‱ mlflow version ‱ mlflow setup (do you use a local database or the filesystem or a remote server? What does your mlflow tracking uri start with? file:///, sqlite:///, https:///? ) ‱ what your SQL requests look like in
? a multiline string with triple quotes or a "normal" raw string ? ‱ Can you create a minimal reproducible example with e.g. pandas-spaceflights starter? ‱ Can you try to set a breakpoint / print statement in the hook to show
gotcha: https://github.com/mlflow/mlflow/blob/9b9357efdbf183fa6c5bb2432a31f24b724e05d4/mlflow/utils/validation.py#L53 For some reason they have updated the value to 6000, but it seems to still be failing on mlflow's end... Can you open up a bug in the mlflow repo? I'll hardcode the value in kedro-mlflow meanwhile > https://github.com/mlflow/mlflow/blob/9b9357efdbf183fa6c5bb2432a31f24b724e05d4/docs/source/tracking/backend-stores.rst#supported-store-types > Note > > In Sep 2023, we increased the max length for params recorded in a Run from 500 to 8k (but we limit param value max length to 6000 internally). mlflow/2d6e25af4d3e_increase_max_param_val_length is a non-invertible migration script that increases the cap in existing database to 8k . Please be careful if you want to upgrade and backup your database before upgrading.
đŸ„ł 1
I suspect a mismatch between your local mlflow version (which supports up to 6000 parameters length) and the server one (which is limited to 500 because it is an older version). I should eventually let people configure this because it's the second time someone complains about this ; on the other hand, it's been 3 years and only 2 people have complained 😉
đŸ„ł 1
The only workaround I envison is to store your sql query in a yaml / text file and consider it a dataset instead of a param. You can make it a
if you want to ensure logging in mlflow. I accept PR if you want a fix be released fast!
đŸ„ł 1
amazing, thanks so much for looking into it !! ⭐ I will see with my team if we go for MLflowArtifactDataset or if we go for a PR