Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

Hi Team, has anyone tried using versioning on `SparkDataSet`  ? im trying to version a csv. Funny thing is that it fails by a `VersionNotFoundError` but it still saves the new version. Can someone suggest any ideas ?

more info <https://kedro.org/blog/managed-delta-tables-kedro-dataset>

how can i create versions of a file on HDFS ? I dont want the file to be appended / upserted etc but every run,  a new file should be created with the latest timestamp etc

I imagined that would be exactly the same as local filesystem. You can always instruct your own versioning schema by doing some templating value with directory. i.e. `some_path/{version}/dataset.parquet`  

Moreover, I agree Delta offer native versioning (more efficient) and could be a better choice. Note that CLI argument such as `--load-version` assume the Kedro versioning scheme so it won’t work with any native versioning. But again, you can use a templated value to get around with it.