Hey all, I'm running into a curious situation: Whe...
# questions
d
Hey all, I'm running into a curious situation: When running a Kedro pipeline in Databricks, and saving the results to MLflow (through kedro_mlflow plugin), occasionally some parallel code will trigger a new run on the experiment. The biggest example is running hyperparameter optimization with Optuna, when using n_jobs=-1 for parallel execution, out of 100 trials maybe ~4 will randomly trigger a new MLFlow run inside the experiment (the other trials run normally without triggering new runs). This is driving me nuts. Any guess on possible causes for it?
h
Someone will reply to you shortly. In the meantime, this might help:
d
Found it! Databricks enables autologging, and all the parallel stuff must be causing a desync at some points. Possibly an mlflow bug? Either way, just need to disable it with a hook. @Nok Lam Chan might be worth putting something like this hook in the databricks starter
Copy code
class DisableMLFlowAutoLogger:    
    @hook_impl(tryfirst=True)
    def after_context_created(self, context) -> None:    
        mlflow.autolog(disable=True)
❤️ 1
n
Is this auto logging enable by default? From the docs it seems to be something you need to enable https://mlflow.org/docs/latest/tracking/autolog.html
d
It's enabled by default on databricks (check https://docs.databricks.com/en/mlflow/databricks-autologging.html ) Since most people run Kedro on databricks through the notebooks, this conflict might appear
👀 1
n
Interesting, any idea why this only triggers random on a subset of parallel runs but not all?
d
No idea. But my guess is some desync or race condition in how mlflow deals with passing the run parameters across a cluster, I doubt the problem comes from Kedro
💡 1
👍 1
y
Can you open an issue in kedro-mlflow with your proposed solution ( a link to this conversation is enough)? I'm inclined to add it by default in the plugin.
👍🏼 1
👍 1
d