https://kedro.org/ logo
#questions
Title
# questions
t

tom kurian

12/06/2023, 2:07 PM
Hi all, While saving a spark parquet versioned dataset i am hitting the following error,
Copy code
VersionNotFoundError: Did not find any versions for SparkDataset(file_format=parquet, filepath=<s3://bucket/folder/file_name>, 
load_args={'header': True}, save_args={'mode': overwrite}, version=Version(load=None, save='2023-12-06T13.06.41.920Z'))
config file:
Copy code
_pq: &_pq
  type: spark.SparkDataSet
  file_format: parquet
  versioned: True
  load_args:
    header: True
  save_args:
    mode: overwrite

model_input.narrow_master.narrow_master:
  filepath: ${base}/model_input/master
  <<: *_pq
Copy code
kedro                            0.18.14
kedro-datasets                   1.8.0
Kedro Versions, What Am I doing wrong here
d

Deepyaman Datta

12/06/2023, 2:11 PM
Your filepaths in logs and catalog entry aren't matching in your sanitized example, so hard to tell. Also, can you share what files are available under the path? If I had to guess with this incomplete information, I think you saved a non-versioned dataset before, didn't clean it up, and are now trying to save a versioned dataset, and Kedro is getting confused looking at the existing file structure. 🙂
t

tom kurian

12/06/2023, 2:13 PM
file paths I changed due to privacy issue,
let me check the path
d

Deepyaman Datta

12/06/2023, 2:15 PM
file paths I changed due to privacy issue,
Yes, but they're not matching. 🙂 If you can sanitize it in a way that's still analogous to the actual structure, that would be helpful, because the issue is quite likely related to the file paths/existing files at that path.
Same problem 👆🏻
d

Deepyaman Datta

12/06/2023, 3:05 PM
@marrrcin I'm not confident on that. My memory is hazy, but I think we used to version all our
SparkDataset
instances on projects, back when I used it years ago. Not sure if something's fundamentally changed. To double checked, I also looked through the issue tracker, as well as seeing tests for versioning on different filesystems in https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/tests/spark/test_spark_dataset.py. So, I personally do think it should work. 🙂
m

marrrcin

12/06/2023, 9:07 PM
@datajoely how is it then?
d

datajoely

12/06/2023, 10:24 PM
I’m not entirely use - I thought it wasn’t compatible
😬 1