Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

Hi all, I'm facing a `java.lang.OutOfMemoryError: Java heap space` error storing a JSON-file of 2.5M rows on AWS S3 via a Kedro pipeline. ECS Compute has 104 GB memory already.
Any recommendation how to configure this? Repartition experience? Spark config? Or work around it?

This is a spark configuration error not actually a Kedro one. So yeah edit `spark.yml` to do in Kedro or provide env vars to do independently