Yuchu Liu
10/25/2022, 1:40 PMDeepyaman Datta
10/25/2022, 1:45 PMYuchu Liu
10/25/2022, 2:00 PMdata_example:
<<: *sp_pq
filepath: data/filename
layer: cleaned
3.
2022-10-25 15:02:07,057 - kedro.io.data_catalog - INFO - Saving data to `filename` (SparkDataSet)...
22/10/25 13:02:07 ERROR Utils: Aborting task
java.io.FileNotFoundException:
/Users/mynamehere/Documents/project_folder/data/filename (Is a directory)
It is possible the underlying files have been updated. You can explicitly invalidate
the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by
recreating the Dataset/DataFrame involved.
Deepyaman Datta
10/25/2022, 2:03 PMtest.parquet
? That would be more equivalent. Similarly, what if you do df.write
to "data/filename"
?Yuchu Liu
10/25/2022, 2:04 PMDeepyaman Datta
10/25/2022, 2:08 PMsave_args:
mode: overwrite
sp_parquet
includes that alreadyYuchu Liu
10/25/2022, 2:41 PMsave_args:
mode: overwrite
header: true
sep: ','
decimal: '.'
Deepyaman Datta
10/25/2022, 3:12 PMRabeez Riaz
10/26/2022, 11:25 AMDeepyaman Datta
10/26/2022, 1:11 PMRabeez Riaz
10/26/2022, 1:25 PMconda list
and pip list
were only showing one of the versions so we didn’t pick up on the issue initiallyDeepyaman Datta
10/26/2022, 2:01 PM