Hi fellows I am cleaning dependencies in our `kedro` code an Kedro #questions

Hi fellows, I am cleaning dependencies in our `ked...

Flavien

10/31/2025, 8:54 AM

Hi fellows, I am cleaning dependencies in our

kedro

code and, upon scrutiny, I am a bit confused by the dependencies for

databricks.ManagedTableDataset

. In

pyproject.toml

, https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/pyproject.toml, is stated

Copy code

hdfs-base = ["hdfs>=2.5.8, <3.0"]
s3fs-base = ["s3fs>=2021.4"]
...
databricks-managedtabledataset = ["kedro-datasets[hdfs-base,s3fs-base]"]
databricks = ["kedro-datasets[databricks-managedtabledataset]"]

But in the implementation, I don't see any reference to those two packages while the dataset requires

pyspark

which is not stated as a dependency if I am not mistaken. Could you tell me if my interpretation is incorrect?

datajoely

10/31/2025, 9:09 AM

This is possibly true and no one has ever reported the issue until now because you’re so likely to have Pyspark in a databricks env I don’t think it’s come up! Please submit a PR or a GitHub issue it would be much appreciated

👍 1

Flavien

10/31/2025, 9:30 AM

I will do that. 👍

🙏 2

Juan Luis

10/31/2025, 1:13 PM

there are several issues related to this, see https://github.com/kedro-org/kedro-plugins/issues/135 and linked issues

Flavien

10/31/2025, 1:55 PM

@Juan Luis, while I see the connection between the two groups of datasets, they are nevertheless independent if I am not mistaken. By mentioning the issues, do you mean that we shouldn't fix the Databricks part without fixing the more generic Spark ones?

Open in Slack