deepseek release a DuckDB based data processing en...
# resources
n
deepseek release a DuckDB based data processing engine: https://github.com/deepseek-ai/smallpond
@datajoely
thought you may be interested, the codebase actually looks small enough that I may spend some time to look at it
j
what is "data processing framework" in this context? a dataframe library?
n
So it's DuckDB + 3FS (deepseek's own filesystem)
d
I saw!
cool stuff
n
I guess it's an orchestrated Duckdb run in "local" ssd?
🤔 1
I haven't read, but I am guess the 3FS deal with the I/O & network part while Duckdb handle compute local mostly.
image.png
looks like it has some lightweight DB components there
playing with the example, it also use
ray[core]
for scheduling
👀 1
j
interesting
n
https://mehdio.substack.com/p/duckdb-goes-distributed-deepseeks There is now a more detailed blog about this
j
the name of the project seems to be a reference to Google's BigLake 🙃
Mehdi's writeup is great 💯