Yair Camborda Morocho
09/20/2024, 5:12 PMrequirements.txt
file:
ipykernel==6.29.5
ipython==8.18.1
jupyterlab==4.2.3
kedro==0.19.6
kedro-datasets[databricks]==4.0.0
kedro-mlflow==0.12.2
kedro-telemetry==0.5.0
kedro-viz==9.1.0
openpyxl==3.1.5
numpy==1.26.4
pandas~=1.5
pillow==10.4.0
plotly==5.22.0
pre-commit==3.8.0
polars==1.7.1
python-dotenv==1.0.1
scikit-learn==1.5.1
scipy==1.13.1
seaborn==0.12.2
shap==0.46.0
tqdm==4.66.4
xgboost==2.1.0
The main issue is that s3fs
and hdfs
, which are necessary for the DataCatalog
with Kedro-Databricks, are not being installed. After installing those dependencies manually, I’m now getting the following error:
ImportError: cannot import name 'deprecated' from 'typing_extensions'
I’ve been unable to resolve this error. Here’s the code I’m using to load the catalog, which triggers the error:
bootstrap_project(project_root)
catalog = (
KedroSession.create(
project_path=project_root,
env=env,
extra_params=params,
)
.load_context()
.catalog
)
Interestingly, when I run the pipeline on an All-purpose cluster, everything works fine.
My question is: Has anyone successfully run Kedro on a Databricks Serverless cluster without issues? Or is it possible that Databricks Serverless doesn’t fully support Kedro yet?
Any advice or experiences would be greatly appreciated. Thanks in advance!Juan Luis
09/21/2024, 5:18 PMtyping_extensions
version. can you paste the full traceback, to see where the error comes from, and a complete output of pip freeze
, to see the versions of all dependencies?