Hi Kedro community, I'm trying to run a Kedro pip...
# questions
y
Hi Kedro community, I'm trying to run a Kedro pipeline on Databricks using a Serverless cluster, but I’ve encountered several issues with installing the required packages. Below is my
requirements.txt
file:
Copy code
ipykernel==6.29.5
ipython==8.18.1
jupyterlab==4.2.3
kedro==0.19.6
kedro-datasets[databricks]==4.0.0
kedro-mlflow==0.12.2
kedro-telemetry==0.5.0
kedro-viz==9.1.0
openpyxl==3.1.5
numpy==1.26.4
pandas~=1.5
pillow==10.4.0
plotly==5.22.0
pre-commit==3.8.0
polars==1.7.1
python-dotenv==1.0.1
scikit-learn==1.5.1
scipy==1.13.1
seaborn==0.12.2
shap==0.46.0
tqdm==4.66.4
xgboost==2.1.0
The main issue is that
s3fs
and
hdfs
, which are necessary for the
DataCatalog
with Kedro-Databricks, are not being installed. After installing those dependencies manually, I’m now getting the following error:
Copy code
ImportError: cannot import name 'deprecated' from 'typing_extensions'
I’ve been unable to resolve this error. Here’s the code I’m using to load the catalog, which triggers the error:
Copy code
bootstrap_project(project_root)

catalog = (
    KedroSession.create(
        project_path=project_root,
        env=env,
        extra_params=params,
    )
    .load_context()
    .catalog
)
Interestingly, when I run the pipeline on an All-purpose cluster, everything works fine. My question is: Has anyone successfully run Kedro on a Databricks Serverless cluster without issues? Or is it possible that Databricks Serverless doesn’t fully support Kedro yet? Any advice or experiences would be greatly appreciated. Thanks in advance!
j
hi @Yair Camborda Morocho,this looks like a problem with your
typing_extensions
version. can you paste the full traceback, to see where the error comes from, and a complete output of
pip freeze
, to see the versions of all dependencies?