Leo Casarsa02/09/2023, 5:09 PM
Workflow Orchestration tools..
I have been crushing through the documentation of a bunch of different workflow orchestration tools. This is my inner map so far. [...]
Kubeflow, Metaflow, Flyte, Kedro, and ZenML focus more on ML pipelines and experimentation usability, like easy switching between local and cloud. Kubeflow is for ML what Argo is for data flows, so expect it to be a steep learning curve if you are not a Kubernetes expert, which most data scientists are not, so this might explain why it is frowned upon. All of these are new and shiny, but again I need to dig a little deeper to understand the differences. Kedro is opinionated about project structure and does not seems to be build with big scalable workflows in mind, and I got the feeling that Kedro is like DVC but more aimed towards ML specifically, and thus it might be a good fit for consultants that are building many smaller projects (?), Metaflow, Flyte, and ZenML all deal with how to utilize compute clusters in an easy way. ZenML seems to me like it might have some gaps, but it is also the newest one, so that is to be expected at this point in time.Another member then replies:
Thanks for starting the thread, it's very interesting!
I'd like to clarify that Kedro is a Python library for building modular data science pipelines. Kedro helps you write data science workflows that are made of reusable components, each with a "single responsibility".
Kedro is not an orchestration tool like Argo Workflows or Kubeflow Pipelines. Check out the deployment guide for how to run Kedro pipelines on Airflow, Argo Workflows or Kubeflow Pipelines. We have successfully used Kedro to build data-science-friendly pipelines that we can still run at scale with Kubeflow Pipelines.https://mlops-community.slack.com/archives/C015J2Y9RLM/p1675865574676169
Juan Luis02/09/2023, 5:11 PM
Toni - TomTom - Madrid02/10/2023, 5:31 PM
Juan Luis02/13/2023, 9:02 AM