This is nice. I have been looking into Ibis and I ...
# random
g
This is nice. I have been looking into Ibis and I think it is a really good way for Kedro projects to move away from explicit SQL: https://kedro.org/blog/sql-data-processing-in-kedro-ml-pipelines
ibis 4
👍 5
j
thanks @Galen Seilis !! this was authored by @Deepyaman Datta , our Ibis expert in the Kedro TSC :) with input from the Ibis team as well as @Iñigo Hidalgo , @datajoely
❤️ 1
d
I'm slightly biased, but I do think Ibis and Kedro work together to enable Python-first, traditionally SQL-heavy, data engineering workflows—and they do so well, at that! A couple more resources on this topic: • @Juan Luis and I gave a tutorial at PyData London:

https://www.youtube.com/watch?v=ffDHdtz_vKc

(material: https://github.com/ibis-project/kedro-ibis-tutorial) • I shared a previous blog post on integrating Kedro and Ibis (before the official dataset release): https://kedro.org/blog/building-scalable-data-pipelines-with-kedro-and-ibis Would love to see what you do with Kedro and Ibis!
💡 1
@Juan Luis Oops, the post above was actually written by @Dmitry Sorokin!
💯 1
f
So now we have to review the dbt/kedro split 😄 before my mental model was always kedro -> python, dbt -> sql
d
@Florian d I think that split still works on a syntax level. SQL has a much lower barrier to entry than SQL. Writing well tested, robust SQL code is easier than doing the same in Python. Python is better for when you need dynamism, loops, API calls etc.
👍 1
Ibis makes Kedro a great fit for SQL back-ends where before it wasn’t
💯 1
unless you used Spark
f
Agreed there is also the consideration in supporting multiple frameworks in a team vs one while balancing perfect fit for purpose
💯 2
d
(Putting aside that I'm going around giving talks with clickbait-y titles like "Analytics engineering without dbt?"...)
So now we have to review the dbt/kedro split 😄 before my mental model was always kedro -> python, dbt -> sql
@Florian d I think this is still fair! The way I generally approach it right now is, there are a lot of people using Python (and frameworks like Kedro) who are doing objectively bad things, like extracting all the data from their database to do ETL in pandas. My first priority is to get those people to use Ibis when they are talking to database (and traditionally should have been doing their work with SQL). Beyond that, as @datajoely says, there are places where Python can give you more power (parametrization, testing), as well as the benefit of being able to switch engines seamlessly (DuckDB SQL and Snowflake SQL aren't the same, and Ibis can largely alleviate those differences, so that you can use the right backend for local dev and prod deployment). But, by in large, if you're happy using SQL, you can keep using SQL.
🥳 1