Hi Team...we are trying to run pyspark job in data...
# questions
b
Hi Team...we are trying to run pyspark job in dataproc cluster...following steps are followed(pls refer screenshot): 1. wheel file was generated for the project 2. pushed wheel file and conf, logs folders/files into the dataproc cluster 3. pip install wheel 4. run kedro . when running kedro, it throws error below ...Can you pls help what are we missing here :ERROR org.apache.spark.SparkContext: Error initializing SparkContext. org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/root/.sparkStaging/application_1677266242748_0002/pyspark.zip could only be written to 0 of the 1 minReplication nodes. There are 0 datanode(s) running and 0 node(s) are excluded in this operation.
@Deepyaman Datta Hi Deepyaman....any pointer on this pls...
d
Doesn't look like a Kedro issue to me, sorry.