Friends, I'm starting to use `kedro` with `uv`. If...
# questions
d
Friends, I'm starting to use
kedro
with
uv
. If I start the package with Pyspark, I get an error. Here are the steps to reproduce. Start running:
Copy code
uvx kedro new
When prompted, I choose the option to install all tools (this includes pyspark). The project is created. I get into the directory and run:
Copy code
uv run ipython
Inside ipython, if I try
%load_ext kedro.ipython
, then I get the error:
Copy code
The operation couldn't be completed. Unable to locate a Java Runtime.
Please visit <http://www.java.com> for information on installing Java.

/Users/davi/test/.venv/lib/python3.11/site-packages/pyspark/bin/spark-class: line 97: CMD: bad array subscript
head: illegal line count -- -1
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:1                                                                                    │
│                                                                                                  │
│ /Users/davi/test/.venv/lib/python3.11/site-packages/IPyt │
│ hon/core/interactiveshell.py:2482 in run_line_magic                                              │
│                                                                                                  │
│   2479 │   │   │   if getattr(fn, "needs_local_scope", False):                                   │
│   2480 │   │   │   │   kwargs['local_ns'] = self.get_local_scope(stack_depth)                    │
│   2481 │   │   │   with self.builtin_trap:                                                       │
│ ❱ 2482 │   │   │   │   result = fn(*args, **kwargs)
....
PySparkRuntimeError: [JAVA_GATEWAY_EXITED] Java gateway process exited before sending its port number.
Any idea on what might be happening? BTW, I'm on a Mac.
👀 1
r
The error seems to be due to the unavailability of JRE. Is this the first time you are using pyspark ? Or this happened now ? Could you please check your
conf/base/spark.yml
👍 2
If you have java configured on your machine and you also did
pip install -r requirements.txt
for the project, with default spark.yml below, there should not be any issues. Please check if JRE is available (
java -version
)
Copy code
spark.driver.maxResultSize: 3g
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
spark.sql.execution.arrow.pyspark.enabled: true

# <https://docs.kedro.org/en/stable/integrations/pyspark_integration.html#tips-for-maximising-concurrency-using-threadrunner>
spark.scheduler.mode: FAIR
d
thanks, @Ravi Kumar Pilla. I've changed to a mac recently, and had not installed the openjdk.
👍 1
r
Installing should fix the issue. If you still face issue, please let us know. Thank you
d
it worked like a charm. Sorry for the inconvenience.
🥳 3
np 1