https://kedro.org/ logo
#questions
Title
# questions
a

Andreas_Kokolantonakis

05/09/2023, 2:51 PM
Hi everyone, I am facing the following issue when I am trying to read a CSV with spark. With pandas, it works fine but with spark, it seems I need some extra configurations. Could you please point me in the right direction? thank you in advance!
j

Juan Luis

05/09/2023, 2:57 PM
welcome @Andreas_Kokolantonakis! I think the file was deleted and cannot be seen
a

Andreas_Kokolantonakis

05/09/2023, 2:57 PM
yep sorry
Screenshot 2023-05-09 at 17.57.33.png
j

Javier del Villar

05/09/2023, 3:03 PM
you should add s3a:// to the filepath
sorry! Hi! hehehe
a

Andreas_Kokolantonakis

05/09/2023, 5:06 PM
tried all the recommended approaches, still getting the same error
I used context.py to load a new session with s3 configurations. no luck, is anyone able to help me further please?! thanks
j

Juan Luis

05/09/2023, 5:06 PM
could you paste the traceback that corresponds to the
s3a://
protocol @Andreas_Kokolantonakis?
a

Andreas_Kokolantonakis

05/09/2023, 5:08 PM
@Juan Luis
I used the pyspark starter, and added the s3AFile system as a configuration in the spark.yml file
but still no luck
j

Javier del Villar

05/09/2023, 5:16 PM
Now you have a different error now, you are missing the AWS connector in your spark's classpath
j

Juan Luis

05/09/2023, 5:16 PM
the error is slightly different now indeed:
Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
(leaving it written for easier copy-paste)
a

Andreas_Kokolantonakis

05/09/2023, 5:18 PM
yep, but I am adding this config in spark.yml. How I can make sure that kedro will use ProjectContext(KedroContext) to initialise the spark session?
j

Javier del Villar

05/09/2023, 5:18 PM
searching the channel for that error I got: https://kedro-org.slack.com/archives/C03RKP2LW64/p1681471022814169
a

Andreas_Kokolantonakis

05/09/2023, 5:19 PM
will try to follow this!
hi everyone, how I can make sure that kedro will use spark.yml and context.py when initializing a spark session? does it work out of the box or I need somehow to point at it? thanks!
j

Juan Luis

05/10/2023, 7:56 AM
hi everyone, how I can make sure that kedro will use spark.yml and context.py when initializing a spark session? does it work out of the box or I need somehow to point at it? thanks!
n

Nok Lam Chan

05/10/2023, 9:52 AM
@Andreas_Kokolantonakis What version of kedro are you using? If you are using 0.18.x you should be able to do this with https://docs.kedro.org/en/stable/integrations/pyspark_integration.html#initialise-a-sparksession-using-a-hook
j

Juan Luis

05/10/2023, 10:25 AM
I think it's the other Andreas @Andreas_Kokolantonakis 😄
a

Andreas_Kokolantonakis

05/10/2023, 11:43 AM
@Haris Michailidis
n

Nok Lam Chan

05/10/2023, 12:09 PM
😅sorry I need to work on my skills at tagging people…
🙈 1