Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

<https://medium.com/datamindedbe/use-dbt-and-duckdb-instead-of-spark-in-data-pipelines-9063a31ea2b5|https://medium.com/datamindedbe/use-dbt-and-duckdb-instead-of-spark-in-data-pipelines-9063a31ea2b5>

Wonder how much Duckdb can push back the need of Spark:eyes:

Nice article by the wonderful folks of data minded, a Belgian data engineering consultancy company (who also have an academy and a managed product)!

Together with Polars, duckdb heavily reduces the need for Spark. Unless there is no way to process you data on a single machine, Spark is almost always overkill. The way I see it: use duckdb + dbt if you want to go SQL all the way, use polars if you want to stay in Python (suits best if you use kedro IMO) and use Spark if you have to process TB’s of data. 

And ibis could potentially enable similar workflow with Kedro