JOEL WILSON
02/07/2023, 7:15 AMpyarrow==0.14.0
java version "1.8.0_341"
Java(TM) SE Runtime Environment (build 1.8.0_341-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.341-b10, mixed mode)
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.3.1
/_/
Using Scala version 2.12.15, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_341
Branch HEAD
Compiled by user yumwang on 2022-10-15T09:47:01Z
Revision fbbcf9434ac070dd4ced4fb9efe32899c6db12a9
Url <https://github.com/apache/spark>
Filip Panovski
02/07/2023, 9:27 AMMethod <...> does not exist
in py4j is likely the equivalent of NoSuchMethodError
in Java. This likely means that your version of Spark is expecting a different version of the Python library than what is available at runtime, so you should check which version Spark is expecting and which is actually being used. If there's a discrepancy, fixing this will probably fix your problem.pyarrow==0.14.1
or 1.0.0
?
See also: https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html#recommended-pandas-and-pyarrow-versionsJOEL WILSON
02/07/2023, 10:35 AM1.0.0
but same error. Also tried updating the spark/bin/conf/spark-defaults with spark.sql.execution.arrow.pyspark.enabled=false
but still same error.
2023-02-07 13:32:29,274 - py.warnings - WARNING - createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed by the reason below:
An error occurred while calling z:org.apache.spark.sql.api.python.PythonSQLUtils.readArrowStreamFromFile. Trace:
py4j.Py4JException: Method readArrowStreamFromFile([class org.apache.spark.sql.SQLContext, class java.lang.String]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:339)
at py4j.Gateway.invoke(Gateway.java:276)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Unknown Source)
Attempting non-optimization as 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.
2023-02-07 13:32:29,275 - py.warnings - WARNING - iteritems is deprecated and will be removed in a future version. Use .items instead.
2023-02-07 13:32:29,429 - kedro.io.data_catalog - INFO - Saving data to 'ftr_account_customer_month_spine' (SparkDataSet)...
23/02/07 13:32:53 ERROR FileFormatWriter: Aborting job 5f8a8e57-29de-46a3-899d-195f59b90171.
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
Filip Panovski
02/07/2023, 10:59 AMBen Horsburgh
02/07/2023, 1:00 PMC:\Users
when overwriting during save https://stackoverflow.com/questions/51561061/scala-spark-overwrite-parquet-file-failed-to-delete-file-or-dir