Question Repo separation btw ETL and ML apps Kedro #questions

Join Slack

[Question: Repo separation btw ETL and ML apps]

# questions

Олег Литвинов

01/15/2025, 2:35 PM

[Question: Repo separation btw ETL and ML apps]

Hall

01/15/2025, 2:35 PM

Someone will reply to you shortly. In the meantime, this might help:

Олег Литвинов

01/15/2025, 2:35 PM

Hello, team! I have some question regarding best practices. I am developing a relatively classic ML solution which reads data from S3, runs ETL, and then trains and serves multiple models. Each model has a different preprocessing pipeline while the ETL contains model-independent logic. I plan to use Kedro with Kedro-MLflow plugin. I think, the application architecture suggested works great for me but I have doubts about separation of concerns. My main concern is about keeping ETL and ML applications together in one repository. Here are some thoughts and inputs which I think will be useful for the decision: 1. I think each model will have it's own repository with it's own Kedro + Kedro-MLflow usage. The logic btw models and their pipelines is very different and teams working on them are expected to be independent. However, all teams are dependent on the same ETL and therefore will have to sync some contract changes 2. ETL and ML apps will very likely use different infrastructure: for example, AWS batch and AWS SageMaker respectively. 3. Both ETL and ML apps are expected to be managed by Kedro-Airflow Thank you very much for your help!

Rashida Kanchwala

01/15/2025, 5:08 PM

hi @Олег Литвинов, Kedro doesn't have a best practice as such—it really depends on your team's workflows and requirements. Kedro is modular in nature and supports both integrated and decoupled approaches. Given your setup with distinct infrastructure requirements, separating them makes sense.

❤️ 1

Олег Литвинов

01/15/2025, 5:10 PM

hi @Rashida Kanchwala! Thank you very much for sharing your thoughts. I appreciate it!

4 Views

Open in Slack

Previous Next