Hey team, facing a weird dependency issue. Seems l...
# questions
p
Hey team, facing a weird dependency issue. Seems like the latest
kedro-datasets[databricks]~=2.0.0
depends on a fairly older
delta-spark~=1.2.1
, which is not compatible with pyspark 3.3 or higher (I want to be able to use DatabricksConnect/SparkConnect for local development, which requires pyspark 3.4 or higher). Any solution for this given that delta-spark is already on version 3.1?
d
Thanks for raising this!
You could force the install of the latest
delta-spark
my hunch is that it will work if you do that
j
opened https://github.com/kedro-org/kedro-plugins/issues/571, thanks @Pedro Sousa Silva! would you like to send a PR yourself? and see if the CI passes at least 🙂
p
Thanks guys! Let me try that later tonight/tomorrow @Juan Luis 🙂
🙌🏼 1
@datajoely any easy way to force the install of the latest delta-spark? I was attempting that via
--no-deps
flag on the
pip install kedro-datasets[databricks]~=2.0.0
and installing delta-spark~=3.0.0 and other dependencies manually, but that also ignores the extra "databricks"
d
delta-spark~=1.2.1
"pyspark>=2.2, <4.0"
"pandas>=1.3, <3.0"
so maybe if you can get those to install it will work?
the alternative is to copy the implementation into your project and reference it as a class path
p
indeed, the more I think about it, the more I believe the easiest way is to copy the implementation of ManagedTableDataset
👍 1
d
thanks for raising this though! we’ll release the new versions quickly 🙏
p
Sure. Will try and raise a PR myself for some learning if there's time 🙂
Btw fyi the spark extra also enforces a dependency on delta-spark<3.0, which is a bit less restricting than the "databricks" extra but still won't allow for pyspark3.5/DBR14
👍 1
Thanks!