Hey
@Mohamed El Guendouz, I can help with this request as I have done this very extensively across GCP Dataproc serverless + Compute Engine + Airflow (Cloud Composer)
I am contributing a GCP Dataproc deployment guide to Kedro's official docs here:
https://github.com/kedro-org/kedro/pull/4393 (Currently it's in draft). Also can talk about a lot more than this guide has detailed i.e.
• Dataproc compute engine
• Dataproc provisioning
• CI/CD - DEV/PROD workflows (if that environment tiering pattern applies to you),
• Dataproc experimentation practices for Data Scientists
• GCP IAM practices
• Incorporating GCS, BigQuery etc storage + compute services with Dataproc
• Common Dataproc errors / gotchas
Initially it is limited to Dataproc serverless but will add more contributions if this one gets incorporated.
Please have a look and let me know in case you have any questions 🙂
Also, to the kedro maintainers, appreciate you taking a look at the PR K
CC:
@Ravi Kumar Pilla (As I mentioned that I will be contributing a guide on Dataproc in the attached thread)