Davi Sales Barreira
04/11/2025, 5:42 PMkedro
with uv
. If I start the package with Pyspark, I get an error. Here are the steps to reproduce.
Start running:
uvx kedro new
When prompted, I choose the option to install all tools (this includes pyspark).
The project is created. I get into the directory and run:
uv run ipython
Inside ipython, if I try %load_ext kedro.ipython
, then I get the error:
The operation couldn't be completed. Unable to locate a Java Runtime.
Please visit <http://www.java.com> for information on installing Java.
/Users/davi/test/.venv/lib/python3.11/site-packages/pyspark/bin/spark-class: line 97: CMD: bad array subscript
head: illegal line count -- -1
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:1 │
│ │
│ /Users/davi/test/.venv/lib/python3.11/site-packages/IPyt │
│ hon/core/interactiveshell.py:2482 in run_line_magic │
│ │
│ 2479 │ │ │ if getattr(fn, "needs_local_scope", False): │
│ 2480 │ │ │ │ kwargs['local_ns'] = self.get_local_scope(stack_depth) │
│ 2481 │ │ │ with self.builtin_trap: │
│ ❱ 2482 │ │ │ │ result = fn(*args, **kwargs)
....
PySparkRuntimeError: [JAVA_GATEWAY_EXITED] Java gateway process exited before sending its port number.
Any idea on what might be happening? BTW, I'm on a Mac.Ravi Kumar Pilla
04/11/2025, 5:51 PMconf/base/spark.yml
Ravi Kumar Pilla
04/11/2025, 5:59 PMpip install -r requirements.txt
for the project, with default spark.yml below, there should not be any issues. Please check if JRE is available (java -version
)
spark.driver.maxResultSize: 3g
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
spark.sql.execution.arrow.pyspark.enabled: true
# <https://docs.kedro.org/en/stable/integrations/pyspark_integration.html#tips-for-maximising-concurrency-using-threadrunner>
spark.scheduler.mode: FAIR
Davi Sales Barreira
04/11/2025, 6:14 PMRavi Kumar Pilla
04/11/2025, 6:16 PMDavi Sales Barreira
04/11/2025, 6:23 PM