Kedro+MLflow for MLops
# questions
j
Hello! In our organization we are considering using Kedro with the MLFlow plugin as the main MLOPS framework, and I would like to know about other experiences: does this combination covers most of the main aspects of MLOps in your opinion, or will we need to add more tools to our stack? I know it's a very open question, but I would be extremely interested about what technological stack do you use in your organizations for covering the typical needs in Machine Learning projects.
K 1
m
You'll probably also need some CICD toolkit and sth to scale up data processing/ training (depending on your scale). Up to some point Kedro will be enough (in memory processing).
j
we have a spark cluster (for cases where datasets are bigger than memory) and some ci/cd tools (jenkins), so I guess that we could be covered! Thank you for hte answer!
y
Mlops is a wide field, and people do not refer to the same thing. I think kedro and kedro-mlflow (disclaimer: I'm the author) really help to help develop faster and track experiments. This tremendously increase development speed, decrease maintenance code and help to have some "self documentatio" since all projects are standardized. It also gives a way to deploy easily your applications with the pipeline_ml_factory function, even if you will still have to develop some custom pipelines / API around at one point. You will also hit some memory limit at scale but it really depends what kind of applications you develop (more data engineering heavy or more ml oriented) The second step is to "transfer" (whatever it means) your app from your computer to a production environment. You will need to have at least version control and a basic CI/CD as @marrrcin says, and eventually infrastructure and network tooling depending if it is provided to you. Using kedro helps here because it helps a lot to standardize your project, and the more standard it is, the easier you can automate deployment with CI/CD, but it does not prevent you from using all these other tools, it only simplifies their use. The third step is to "operate" your model in a production environment. It means tracking health, failure, and managing the infra in general; specifically for the data team, you need to orchestrate the input data pipelines and track metrics, eventually create automatic metric tracking(kedro-mlflow can help here) and trigger automatic redeployment (kedro and your CI/CD will help here) . You will need other dedicated tools for this. You do not need to do everything at first and maybe you will never need advanced tooling for production if your applications are not critical. My best guess is that if your leg is not very data mature, kedro and kedro-mlflow will help you to deliver faster so they're a good starting point, but they are by no mean "sufficient" or "all you need". You need at least git +a git forge like GitHub+ a basic CI CD with maybe GitHub action + a target prod environment (a VM or a container orchestrator) are mandatory. You will add other tools (or not) later depending on the issue you face on your way ;)
j
Thank you so much, Yolan, that is extremely helpful advice!!! It's a huge field, yes, and quite intimidating as well. Your message is a good guide for guiding ourselves.
m
You can also get some inspiration about Kedro-centric MLOps here

https://www.youtube.com/watch?v=dRT5bHbLYos

😎
🥳 1
🙂 1