Fred Guth

02/19/2024, 10:37 PM
[noobie alert] First time trying Kedro here. I use duckdb with its python sdk. I keep the data in parquet files. How should I config the catalog? Is there any example of a project using duckdb?

Juan Luis

02/19/2024, 10:59 PM
hi @Fred Guth ! have a look at and it’s all a bit experimental, so if you’re in doubt, don’t hesitate to ask
Deepyaman Datta

02/19/2024, 11:15 PM
@Fred Guth I would recommend using something like Ibis, as @Juan Luis mentioned above, because the dataframe API provides objects more akin to what Kedro works with. There is information about using Ibis with DuckDB on, as well as in the Ibis project docs. If you want to use the DuckDB Python API, it would need to be a bit different, as DuckDB seems to rely on variables in local scope for the Python API, and you'd need to make sure those variables are available in the Kedro nodes. I can try and take a look later, if you want to use this Python API rather than trying Ibis. Do let me know if you run into any issues with the Kedro-Ibis integration, should you to that route. I'm also in the process of adding the dataset to Kedro-Datasets, so hopefully it will become more accessible!
Iñigo Hidalgo

02/20/2024, 10:23 AM
+1 for ibis here. I haven't worked with their duckdb backend specifically, but their dataframe transformation syntax is super clean for the vast majority of transformations you would want to do

Fred Guth

02/21/2024, 12:42 PM
Thank you all, I will give ibis a try.
