https://kedro.org/ logo
#questions
Title
# questions
r

Rob

12/26/2022, 4:21 PM
Hi everyone and happy holidays, I recently started using Kedro and I was looking at its workflow with spark so I'm testing it with the
pyspark-iris
starter. So I already setup spark 3.0 on my Windows machine and it's working, and I'm getting this `DataSetError`:
Copy code
DataSetError: Failed while saving data to data set 
SparkDataSet(file_format=parquet, 
filepath=C:/Users/rober/PycharmProjects/pyspark-test/data/02_intermediate/X_train.parquet, load_args={'header': True, 'inferSchema': True}, 
save_args={'header': True, 'mode': overwrite}).
An error occurred while calling o60.save.
So I already checked the
copy_mode
of the
MemoryDataSet
conf inside the
catalog.yml
and it's set as assign, since there are no actions executed in the previous node so I guess it's the only saving mode. Probably it's something simple, but if someone can help me, I'd appreciate your help
1
Nevermind, only added the hadoop.dll to the
%HADOOP_HOME%/bin
with the winutils.exe and it worked ✌🏻
s

Sebastian Pehle

12/26/2022, 9:28 PM
this solution seems obvious ;)