https://kedro.org/ logo
#questions
Title
# questions
f

Fred Guth

02/19/2024, 10:37 PM
[noobie alert] First time trying Kedro here. I use duckdb with its python sdk. I keep the data in parquet files. How should I config the catalog? Is there any example of a project using duckdb?
j

Juan Luis

02/19/2024, 10:59 PM
hi @Fred Guth ! have a look at https://github.com/deepyaman/jaffle-shop/blob/main/conf/base/catalog.yml and https://kedro.org/blog/building-scalable-data-pipelines-with-kedro-and-ibis it’s all a bit experimental, so if you’re in doubt, don’t hesitate to ask
👍 2
d

Deepyaman Datta

02/19/2024, 11:15 PM
@Fred Guth I would recommend using something like Ibis, as @Juan Luis mentioned above, because the dataframe API provides objects more akin to what Kedro works with. There is information about using Ibis with DuckDB on https://duckdb.org/docs/guides/python/ibis.html, as well as in the Ibis project docs. If you want to use the DuckDB Python API, it would need to be a bit different, as DuckDB seems to rely on variables in local scope for the Python API, and you'd need to make sure those variables are available in the Kedro nodes. I can try and take a look later, if you want to use this Python API rather than trying Ibis. Do let me know if you run into any issues with the Kedro-Ibis integration, should you to that route. I'm also in the process of adding the dataset to Kedro-Datasets, so hopefully it will become more accessible!
K 1
👍 1
i

Iñigo Hidalgo

02/20/2024, 10:23 AM
+1 for ibis here. I haven't worked with their duckdb backend specifically, but their dataframe transformation syntax is super clean for the vast majority of transformations you would want to do
f

Fred Guth

02/21/2024, 12:42 PM
Thank you all, I will give ibis a try.
🙌 1
6 Views