Hi, I'm looking for the best suitable option for o...
# questions
r
Hi, I'm looking for the best suitable option for our use case which is a mixture between data engineering and data science. One of the possible solutions we are looking are • dbt https://docs.getdbt.com/ • kedro • sqlmesh Our main concerns are • time to market • scalability • maintainability • etc. The question that bugs me the most is whether there is some sql query push down solution in kedro? I saw https://ibis-project.org/ which looks like a dataframe solution for query push down but I would like to push my sql model in certain cases directly. Has anybody any idea?
d
So dbt is very similar conceptually to Kedro, although it does lean into the metrics / semantic model part more. Kedro is a python based tool for data engineering and data science Dbt is a tool for building sql transformations mostly in the data engineering space.
This is a slightly old blog because the ibis datasets are now in Kedro https://kedro.org/blog/building-scalable-data-pipelines-with-kedro-and-ibis
But in my opinion the modern way to use Kedro is with Ibis so that you’re also using SQL as an execution engine just like dbt
r
Hey thanks for the super fast response. That's exactly what I want because my transformations consist of heavy load which I would like to run on the rdms (snowflake) and ml models that are better run on open shift. I dont like an unnessesary abstraction layer if not needed and most of the times data engineers feel more confy writing sql than an python data frame quasi sql
❄️ 1
d
Yeah so ibis is the best
And it also makes your code portable so one syntax runs on any backend
I also love the pattern of having dev run on duckdb locally and prod/staging running on BQ/Snwoflake etc
👍 1
Like you said it comes down to team topologies too - sql has a lower barrier to entry so if you have a constrained analyst team then dbt is great. If you want everyone working in the same language ibis / python / Kedro is a great fit
You cannnn if you want so
ibis.sql(…)
and just write sql in python if you want too
👍 1
Sql mesh is also new and interesting but stlll quite an early days project - still SQL first like dbt but benefits massively from seeing some of the mistakes dbt made first time around
d
one more blog post with example how to use Kedro Ibis Dataset https://kedro.org/blog/sql-data-processing-in-kedro-ml-pipelines
d
@Dmitry Sorokin I’m not able to do this where I am, but we need an activity to update all of these blog posts to use the datasets in Kedro extras
👍 2
d
+1 to what @datajoely said about
ibis.sql
. I wrote an article a couple months ago about how this all works, and tl;dr it's more-or-less equivalent if you drop into
ibis.sql
.