Hi Team, has anyone tried using versioning on `Spa...
# questions
d
Hi Team, has anyone tried using versioning on
SparkDataSet
? im trying to version a csv. Funny thing is that it fails by a
VersionNotFoundError
but it still saves the new version. Can someone suggest any ideas ?
d
versioning and spark dont co-exist
use delta if you want that
j
d
how can i create versions of a file on HDFS ? I dont want the file to be appended / upserted etc but every run, a new file should be created with the latest timestamp etc
is that possible ?
n
I imagined that would be exactly the same as local filesystem. You can always instruct your own versioning schema by doing some templating value with directory. i.e.
some_path/{version}/dataset.parquet
Moreover, I agree Delta offer native versioning (more efficient) and could be a better choice. Note that CLI argument such as
--load-version
assume the Kedro versioning scheme so it won’t work with any native versioning. But again, you can use a templated value to get around with it.