https://kedro.org/ logo
#questions
Title
# questions
d

Debanjan Banerjee

11/16/2023, 12:31 PM
Hi Team, has anyone tried using versioning on
SparkDataSet
? im trying to version a csv. Funny thing is that it fails by a
VersionNotFoundError
but it still saves the new version. Can someone suggest any ideas ?
d

datajoely

11/16/2023, 12:33 PM
versioning and spark dont co-exist
use delta if you want that
j

Juan Luis

11/16/2023, 1:10 PM
d

Debanjan Banerjee

11/16/2023, 1:42 PM
how can i create versions of a file on HDFS ? I dont want the file to be appended / upserted etc but every run, a new file should be created with the latest timestamp etc
is that possible ?
n

Nok Lam Chan

11/17/2023, 8:55 AM
I imagined that would be exactly the same as local filesystem. You can always instruct your own versioning schema by doing some templating value with directory. i.e.
some_path/{version}/dataset.parquet
Moreover, I agree Delta offer native versioning (more efficient) and could be a better choice. Note that CLI argument such as
--load-version
assume the Kedro versioning scheme so it won’t work with any native versioning. But again, you can use a templated value to get around with it.