I'm doing a bit of investigation into kedro, and w...
# questions
p
I'm doing a bit of investigation into kedro, and wondered about the best way of running pipelines in a k8s cluster. I see there's a kubeflow plugin https://github.com/getindata/kedro-kubeflow, but the project seems little used and not very actively maintained. Similar observations for the airflow plugin https://github.com/getindata/kedro-airflow-k8s. There also argo workflows, but the docs showing how to do this are out of date https://docs.kedro.org/en/stable/deployment/argo.html, and there's a plugin, kedro-argo, but that seems out of date. So I guess my question is: what's the recommended way to run distributed pipelines in a k8s cluster using well maintained and production ready tooling?
m
The Getindata's plugin for airflow is no longer needed - everything can be now achieved with Kedro's official Airflow plugin - see https://getindata.com/blog/deploying-kedro-pipelines-gcp-composer-airflow-node-grouping-mlflow/
It's a matter of having airflow and using appropriate k8s operator - at the end of the day it's all about creating your own template to generate Airflow DAG.
p
OK thanks, so in summary the best approach is to run airflow in the k8s cluster and to use the official airflow plugin.
m
It's not the best (there's never such thing 🙂 ), it's one of the ways of doing that. Depending on what you already have in your k8s cluster, you might take different approaches.
p
yeah, atm I have an ad hoc system of orchestrating a number of etl type tasks, mediated by passing messages in redis streams. I'm looking at ways to improve this...
m
You can go down the route of different tools that will help you with ETL too and re-purpose it for running Kedro. Once you package your Kedro project into Docker image, you can run it basically anywhere you want.
The "only trick" the plugins are usually doing is the translation of Kedro pipeline into orchestrator's DSL (e.g. Kedro pipeline into Airflow DAG or Kedro pipeline into Kubeflow Pipeline). The steps in the orchestrator usually call kedro like
kedro run --pipeline=<name> --node=<name of the single node>
and that 's general idea.
p
yeah, I need to do some experiments; I want to get a hello world example running e2e, but that's why I'm trying to figure out which orchestrator to start with.
👍 1