https://kedro.org/ logo
#questions
Title
# questions
c

Cory Maklin

09/07/2023, 4:51 PM
Hey everyone, does the Kedro versioned dataset feature work like Delta tables? In other words, if a job is reading a dataset, and another job writes to that same dataset, the first job will continue to read the old version of the dataset?
d

datajoely

09/07/2023, 5:15 PM
No it’s much much more primitive, it just maintains a folder of different run results
It’s a minimal solution for those with limited infra
if you want Delta use it!
i

Iñigo Hidalgo

09/08/2023, 8:35 AM
It should be possible to write a spark-less DeltaTable kedro dataset, though I'm not familiar enough with Delta Tables to know if the non-spark/non-databricks implementation also guarantees ACID.
Would be interesting to see how the
deltatable.history()
could potentially interact with kedro's native versioning.
d

datajoely

09/08/2023, 9:10 AM
we have a Pandas implementation too https://docs.kedro.org/en/stable/kedro_datasets.pandas.DeltaTableDataSet.html It uses the rust library underneath with no JVM
I also think you can get to delta via Polars and Ibis
which use the same thing down the rabbit hole
i

Iñigo Hidalgo

09/08/2023, 9:11 AM
Oooh nice!! Yeah, I was able to interact with it using polars and duckdb here 🙂 https://github.com/inigohidalgo/delta-python/blob/main/main.py
not a kedro implementation though
How does the pandas deltatable work with kedro versioning?
d

datajoely

09/08/2023, 9:24 AM
so I don’t think we’ve enabled Kedro versioning, we just use the delta implementation
Kedro Versioning was always a minimal solution
i

Iñigo Hidalgo

09/08/2023, 9:31 AM
makes sense. thanks!
i guess anyways they aren't really equivalent, since kedro versions are different iterations of the same dataset, but they could be completely different results, for example different model pickles from different training runs. whereas the delta history is made up of diffs, so it's more of a way to follow how a certain table evolved over time, with new rows being added as time passes
d

datajoely

09/08/2023, 9:52 AM
Yes, but I do wonder if we’re able to inject any metadata into those diffs now you mentioni t
i

Iñigo Hidalgo

09/08/2023, 9:56 AM
what sort of metadata would you be including? some reference to kedro versions?
d

datajoely

09/08/2023, 9:57 AM
Yeah the our version ID is actually just our session ID
🎉 1
so you could marry the two worlds
2 Views