<https://onetable.dev/>
# resources
j
m
1. We need more formats. 2. We need more frameworks. 3. We need more abstraction layers. 🧌
j
until somebody tells me I'm doing something terribly wrong, I'm staying with Delta 😅
d
In general, I think Iceberg often gets chosen due to working better with other execution engines (and not being so tied to Spark/Databricks). But having the abstraction layer is nice, especially when building products that could be deployed in many organizations, where you can't go choose their storage format.
👀 1
j
well I see that Polars has support for both, I'll give Iceberg a try next time!
d
The second one is a very Hudi-biased article. 🙂 But let me ask for some resources, I know somebody who's much more knowledgeable about these things.
💯 1
Here's a very Iceberg-biased article that makes the case for storage independent of execution: https://tabular.medium.com/the-case-for-independent-storage-74ac880092d9 🙂 (Tabular is an enterprise Iceberg company) I didn't get any great references (will update if i do!), but it seems a lot of Chinese companies did start with Hudi and run into challenges, and have now been migrating to Iceberg. Iceberg does see significant use in some companies like Netflix (because they created it) and Apple. Re Delta, the main issue seems to be that it was never really intended as an independent file format, and is really most focused on how it plays into the Spark ecosystem. Probably need some more unbiased benchmarks. I'm guessing some of the pro-Iceberg sentiment in the above paragraph is also more focused around streaming use cases.
👀 1