Hi all While saving a spark parquet versioned dataset i am h Kedro #questions

Hi all, While saving a spark parquet versioned da...

tom kurian

12/06/2023, 2:07 PM

Hi all, While saving a spark parquet versioned dataset i am hitting the following error,

Copy code

VersionNotFoundError: Did not find any versions for SparkDataset(file_format=parquet, filepath=<s3://bucket/folder/file_name>, 
load_args={'header': True}, save_args={'mode': overwrite}, version=Version(load=None, save='2023-12-06T13.06.41.920Z'))

config file:

Copy code

_pq: &_pq
  type: spark.SparkDataSet
  file_format: parquet
  versioned: True
  load_args:
    header: True
  save_args:
    mode: overwrite

model_input.narrow_master.narrow_master:
  filepath: ${base}/model_input/master
  <<: *_pq

Copy code

kedro                            0.18.14
kedro-datasets                   1.8.0

Kedro Versions, What Am I doing wrong here

Deepyaman Datta

12/06/2023, 2:11 PM

Your filepaths in logs and catalog entry aren't matching in your sanitized example, so hard to tell. Also, can you share what files are available under the path? If I had to guess with this incomplete information, I think you saved a non-versioned dataset before, didn't clean it up, and are now trying to save a versioned dataset, and Kedro is getting confused looking at the existing file structure. 🙂

tom kurian

12/06/2023, 2:13 PM

file paths I changed due to privacy issue,

tom kurian

12/06/2023, 2:13 PM

let me check the path

Deepyaman Datta

12/06/2023, 2:15 PM

file paths I changed due to privacy issue,

Yes, but they're not matching. 🙂 If you can sanitize it in a way that's still analogous to the actual structure, that would be helpful, because the issue is quite likely related to the file paths/existing files at that path.

marrrcin

12/06/2023, 2:42 PM

https://kedro-org.slack.com/archives/C03RKP2LW64/p1700137981721939?thread_ts=1700137866.087719&channel=C03RKP2LW64&message_ts=1700137981.721939

marrrcin

12/06/2023, 2:42 PM

Same problem 👆🏻

Deepyaman Datta

12/06/2023, 3:05 PM

@marrrcin I'm not confident on that. My memory is hazy, but I think we used to version all our

SparkDataset

instances on projects, back when I used it years ago. Not sure if something's fundamentally changed. To double checked, I also looked through the issue tracker, as well as seeing tests for versioning on different filesystems in https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/tests/spark/test_spark_dataset.py. So, I personally do think it should work. 🙂

marrrcin

12/06/2023, 9:07 PM

@datajoely how is it then?

datajoely

12/06/2023, 10:24 PM

I’m not entirely use - I thought it wasn’t compatible

😬 1

28 Views

Open in Slack

Previous Next