Hey everyone, does the Kedro versioned dataset fea...
# questions
Hey everyone, does the Kedro versioned dataset feature work like Delta tables? In other words, if a job is reading a dataset, and another job writes to that same dataset, the first job will continue to read the old version of the dataset?
No it’s much much more primitive, it just maintains a folder of different run results
It’s a minimal solution for those with limited infra
if you want Delta use it!
It should be possible to write a spark-less DeltaTable kedro dataset, though I'm not familiar enough with Delta Tables to know if the non-spark/non-databricks implementation also guarantees ACID.
Would be interesting to see how the
could potentially interact with kedro's native versioning.
we have a Pandas implementation too https://docs.kedro.org/en/stable/kedro_datasets.pandas.DeltaTableDataSet.html It uses the rust library underneath with no JVM
I also think you can get to delta via Polars and Ibis
which use the same thing down the rabbit hole
Oooh nice!! Yeah, I was able to interact with it using polars and duckdb here 🙂 https://github.com/inigohidalgo/delta-python/blob/main/main.py
not a kedro implementation though
How does the pandas deltatable work with kedro versioning?
so I don’t think we’ve enabled Kedro versioning, we just use the delta implementation
Kedro Versioning was always a minimal solution
makes sense. thanks!
i guess anyways they aren't really equivalent, since kedro versions are different iterations of the same dataset, but they could be completely different results, for example different model pickles from different training runs. whereas the delta history is made up of diffs, so it's more of a way to follow how a certain table evolved over time, with new rows being added as time passes
Yes, but I do wonder if we’re able to inject any metadata into those diffs now you mentioni t
what sort of metadata would you be including? some reference to kedro versions?
Yeah the our version ID is actually just our session ID
🎉 1
so you could marry the two worlds