https://kedro.org/ logo
#questions
Title
# questions
m

Matthias Roels

08/31/2023, 12:03 PM
Quick question, why is kedro-datasets no longer compatible with Spark 3.4?
d

datajoely

08/31/2023, 12:16 PM
looking into this
m

marrrcin

08/31/2023, 12:33 PM
Just FYI, have no problems with
pyspark==3.4.1
and SparkDataSet from kedro.extras: https://github.com/kedro-org/kedro/blob/0293dc15812b27330bba31a01c7b332b3165af2a/kedro/extras/datasets/spark/spark_dataset.py (Kedro 0.18.12), haven’t tested with
kedro-datasets
though.
d
but trying to work out why
current hypothesis is that maybe 3.11 support was the issue
🤔 1
equally the dates are confusing here, 3.11 support was added to PySpark in 3.4.0
m

Matthias Roels

08/31/2023, 12:40 PM
Yeah the dates are because Spark provides patch updates for several versions
👍 2
My guess is that it has something to do with the exception handling in the SparkDataset. There is a deprecation comment in that file. But that should be no reason to remove support for Spark 3.4. In fact, this is something that needs to be fixed in SparkDataset…
👍🏼 1
d

datajoely

08/31/2023, 12:42 PM
Yes agreed
the person who made this change is at the 🦷 dentist, will get an answer to you shortly
🙃 1
🤪 1
😂 1
😅 1
s

Sajid Alam

08/31/2023, 1:23 PM
Hi, I believe this was an oversight as we moved between
pyproject.toml
and
setup.py
midway through the ticket.
❤️ 1
d

datajoely

08/31/2023, 1:30 PM
thanks @Sajid Alam
m

Matthias Roels

08/31/2023, 2:11 PM
So it should be ok with 3.4?
s

Sajid Alam

08/31/2023, 2:12 PM
Yes!
d

datajoely

08/31/2023, 2:33 PM
if you want to force it, it will work
m

Matthias Roels

08/31/2023, 3:31 PM
But it will be fixed with a 1.6.1 release?
s

Sajid Alam

08/31/2023, 3:33 PM
Yes, there is a PR out for it now: https://github.com/kedro-org/kedro-plugins/pull/323
👍 1
2 Views