https://kedro.org/ logo
#resources
Title
# resources
j

Juan Luis

12/10/2023, 9:23 PM
m

marrrcin

12/11/2023, 8:05 AM
1. We need more formats. 2. We need more frameworks. 3. We need more abstraction layers. 🧌
j

Juan Luis

12/11/2023, 8:07 AM
until somebody tells me I'm doing something terribly wrong, I'm staying with Delta 😅
d

Deepyaman Datta

12/11/2023, 2:34 PM
In general, I think Iceberg often gets chosen due to working better with other execution engines (and not being so tied to Spark/Databricks). But having the abstraction layer is nice, especially when building products that could be deployed in many organizations, where you can't go choose their storage format.
👀 1
j

Juan Luis

12/11/2023, 2:52 PM
well I see that Polars has support for both, I'll give Iceberg a try next time!
d

Deepyaman Datta

12/11/2023, 3:42 PM
The second one is a very Hudi-biased article. 🙂 But let me ask for some resources, I know somebody who's much more knowledgeable about these things.
💯 1
Here's a very Iceberg-biased article that makes the case for storage independent of execution: https://tabular.medium.com/the-case-for-independent-storage-74ac880092d9 🙂 (Tabular is an enterprise Iceberg company) I didn't get any great references (will update if i do!), but it seems a lot of Chinese companies did start with Hudi and run into challenges, and have now been migrating to Iceberg. Iceberg does see significant use in some companies like Netflix (because they created it) and Apple. Re Delta, the main issue seems to be that it was never really intended as an independent file format, and is really most focused on how it plays into the Spark ecosystem. Probably need some more unbiased benchmarks. I'm guessing some of the pro-Iceberg sentiment in the above paragraph is also more focused around streaming use cases.
👀 1