Hi! Started Kedro just today so apologies if this ...
# questions
j
Hi! Started Kedro just today so apologies if this question has already been answered elsewhere or is super trivial. I've followed the tutorial to create a new project with an example pipeline, but when I run
kedro run
in the directory of the project (after having run
uv pip install -r requirements.txt
), I get the following error:
Copy code
PySparkRuntimeError: [JAVA_GATEWAY_EXITED] Java gateway process exited before sending its port number.
It seems like I need to install Java for this to work, but there's no mention of Java anywhere in the docs, so this doesn't feel like the right option. I'm on Windows running locally on VS Code and didn't encounter any issues with requirements installation). Is anyone able to help with this error? Thanks! 🙂
l
Hey Joseph! Pyspark is a Python interface for Apache Spark, and since Spark runs on the JVM, it will require you to have Java installed for it to work correctly. Our documents currently assume that the user has a base understanding of Pyspark and focus on the Kedro side of things, but you can find the information you need on the Spark docs: https://spark.apache.org/docs/latest/api/python/getting_started/install.html
Copy code
Note that PySpark requires Java 8 (except prior to 8u371), 11 or 17 with JAVA_HOME properly set. If using JDK 11, set -Dio.netty.tryReflectionSetAccessible=true for Arrow related features and refer to Downloading.
Hope this helps!
j
Fantastic thanks Laura 👍
n
@Joseph McLeish Do you still have the docs open? I wonder which page it is. I don't recall the tutorial requires Pyspark installation, which could be a bit too complicated for a starting example. We do support Pyspark as an extra optional tool, since it's a more realistic use case where you have some preprocessing pipeline run in Spark and ML code in Python.
Particularly, when you create a new project with
kedro new
This is the list of option, if you choose
6
or
all
, then it will included
pyspark
in
hooks.py
which setup the spark connection. If this is not necessary you can skip the option.
j
Yep I was just following the tutorial and used
all
as I wanted to test out the functionality as fully as possible
I was wondering why the tools in your screenshot above (and my project setup too) didn't include
7) Kedro-Viz: Kedro's native visualisation tool
like in the tutorial. I'd installed kedro-viz, so was surprised when I only had 1-6 as options during kedro project setup