Hello, we are using kedro-mlflow (which is great f...
# plugins-integrations
Hello, we are using kedro-mlflow (which is great for logging model, artifacts, metrics, parameters when we do kedro run in local) but we are also using kedro-airflow where we run each node in a DockerOperator. With kedro-airflow, each DAG step is executed and creates a new run ID (i.e. we have one run for the model training node, one for model evaluation etc.) : the pipeline is totally fragmented. This is a real issue and we would like to have everything in one run (even if multi-step DAG) How can we achieve this? Thank you very much
A 1
Take a look at this example: https://github.com/getindata/kedro-airflow-gke-example/blob/main/templates/gke_operator.pytpl From https://medium.com/@getindatatechteam/deploying-efficient-kedro-pipelines-on-gcp-composer-airflow-with-node-grouping-mlflow-a45e68d9f42f General idea is: 1. Initialize the mlflow run, save run id as an initial, separate step 2. Inject mlflow run id into all other Kedro nodes (so they will log into the same run id) You can e.g. use Airflow's Xcom for that as in the linked example
👍 1
Thanks @marrrcin for the resource: after a careful read, the env variable given via airflow xcoms is the mlflow run name (not the ID). Hence there will still be 1 task = 1 run each time a dockerOperator is used. They will just all have the same name. This does not unite all the DAG tasks in airflow under one unique mlflow run id (allowing to have all logged elements under one run, as is intended by mlflow) In the blog, there is no screenshot of their MLflow runs, but I do believe they would have the same number as the number of DAG tasks in spaceflight_grouped Google Cloud screenshot. The only way to have a unique run is to have a unique task in airflow, that defeats the docker containerization of the DockerOperator. And multiple runs for each task defeats the purpose of mlflow for monitoring runs. Do you think it's feasible to have an mlflow_init step where: 1. we execute a cli command / python script launching a run and then querying it for its ID 2. we make it an env variable and passing it via xcom to the conf of all subsequent tasks Thanks for your input 😊
👎 1
Read the whole thing and the code carefully, it does exactly the thing I've said it does. Everything in the DAG will be logged under the same mlflow run id. https://github.com/getindata/kedro-airflow-gke-example/blob/e301ed5e75a6c770b4a9bbcd472f89088dff468f/templates/gke_operator.pytpl#L137
thanks for highlighting it 😊 will give it a second read 👍
thanks for linking the materials @marrrcin 🙂
😎 2
Nice to know the author is in the slack channel @Artur Dobrogowski 😊 we're figuring things out atm 🚀