hi! im trying to deploy a pipeline using the datab...
# questions
n
hi! im trying to deploy a pipeline using the databricks plugin. im able to
init
,
bundle
, and
deploy
(what looks to be) successfully (i can see the job and files created in the UI), but always get this error when running...
Copy code
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File ~/.ipykernel/6689/command--1-4096408574:22
     20 import importlib
     21 module = importlib.import_module("classification_pipeline")
---> 22 module.classification_pipeline()

AttributeError: module 'classification_pipeline' has no attribute 'classification_pipeline'
it looks like theres confusion about the entry point. some additional details below that may/may not be helpful in debugging... • i'm following these instructions • my pipeline has
dev
,
qa
, and
prod
environments configured within
conf
and i'm trying to deploy
qa
• ive added an existing_cluster_id • the commands ive ran are ◦
kedro databricks init
kedro databricks bundle --env qa --params runner=ThreadRunner
kedro databricks deploy --env qa --runtime-params runner=ThreadRunner
kedro databricks run classification_pipeline
• "classification_pipeline" is used for my package and project names any help is appreciated! @Jens Peder Meldgaard @Nok Lam Chan
update - i just tried running through the iris tutorial to see if that works and im running into the same error
Copy code
AttributeError: module 'iris_databricks' has no attribute 'iris_databricks'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File ~/.ipykernel/1575/command--1-247234484:22
     20 import importlib
     21 module = importlib.import_module("iris_databricks")
---> 22 module.iris_databricks()

AttributeError: module 'iris_databricks' has no attribute 'iris_databricks'
• python 3.12 • kedro-databricks>=0.13.0 • kedro>=1.0.0
👀 1
something i did notice is that when trying to deploy my pipeline a
src/classification_pipeline/__main__.py
was not generated but it was in the iris tutorial. same for any scripts inside pyproject.toml
n
___main___.py
is the script that will be trigger, and often is registered as a "entrypoint" in your
pyproject.toml
for example,
python -m kedro
is executing
kedro/__main__.py
equivalently
Without the context I can't tell the root cause.
20 import importlib
21 module = importlib.import_module("classification_pipeline")
---> 22 module.classification_pipeline()
Where is the code coming from? does
classfication_pipeline.classificaiton_piepline
exist?
j
This sounds very odd. Was this bootstrapped using the
databricks-iris
starter? My first intuition is that you are missing an
__init___.py
file - is that the case? Otherwise, I would need more information about the project to be able to help. Perhaps you could provide a dump of the file tree for the project? 😊
👍🏼 1
n
if by bootstrapped you mean i built classification_pipeline off of an iris-databricks starter, then no. it was just a regular kedro project. here is the file tree for the starter i tried
Copy code
|-- .databricks
|   `-- bundle
|       |-- dev
|       |   |-- .internal
|       |   |-- bin
|       |   |   `-- terraform
|       |   |-- deployment.json
|       |   |-- sync-snapshots
|       |   |   `-- 0177540d148f2c8e.json
|       |   `-- terraform
|       |       |-- .terraform
|       |       |   `-- providers
|       |       |       `-- <http://registry.terraform.io|registry.terraform.io>
|       |       |           `-- databricks
|       |       |               `-- databricks
|       |       |                   `-- 1.65.1
|       |       |                       `-- darwin_arm64
|       |       |                           |-- CHANGELOG.md
|       |       |                           |-- LICENSE
|       |       |                           |-- NOTICE
|       |       |                           `-- terraform-provider-databricks_v1.65.1
|       |       |-- .terraform.lock.hcl
|       |       |-- bundle.tf.json
|       |       |-- plan
|       |       `-- terraform.tfstate
|       `-- prod
|           |-- .internal
|           |-- bin
|           |   `-- terraform
|           |-- sync-snapshots
|           `-- terraform
|-- .env
|-- .gitignore
|-- README.md
|-- build
|   |-- bdist.macosx-11.0-arm64
|   `-- lib
|       `-- iris_databricks
|           |-- __init__.py
|           |-- __main__.py
|           |-- hooks.py
|           |-- pipeline_registry.py
|           |-- pipelines
|           |   |-- __init__.py
|           |   `-- iris
|           |       |-- __init__.py
|           |       |-- nodes.py
|           |       `-- pipeline.py
|           `-- settings.py
|-- conf
|   |-- README.md
|   |-- base
|   |   |-- catalog.yml
|   |   |-- parameters.yml
|   |   `-- spark.yml
|   |-- dev
|   |   |-- .gitkeep
|   |   |-- catalog.yml
|   |   `-- databricks.yml
|   |-- local
|   |   `-- .gitkeep
|   |-- logging.yml
|   `-- prod
|       |-- .gitkeep
|       |-- catalog.yml
|       `-- databricks.yml
|-- data
|   |-- 01_raw
|   |   |-- .gitkeep
|   |   `-- iris.csv
|   |-- 02_intermediate
|   |   `-- .gitkeep
|   |-- 03_primary
|   |   `-- .gitkeep
|   |-- 04_feature
|   |   `-- .gitkeep
|   |-- 05_model_input
|   |   `-- .gitkeep
|   |-- 06_models
|   |   `-- .gitkeep
|   |-- 07_model_output
|   |   `-- .gitkeep
|   `-- 08_reporting
|       `-- .gitkeep
|-- databricks.yml
|-- dist
|   |-- conf-iris_databricks.tar.gz
|   `-- iris_databricks-0.1-py3-none-any.whl
|-- docs
|   `-- source
|       |-- conf.py
|       `-- index.rst
|-- notebooks
|   `-- .gitkeep
|-- pyproject.toml
|-- requirements.txt
|-- resources
|   |-- iris_databricks.yml
|   `-- iris_databricks_iris.yml
|-- src
|   |-- iris_databricks
|   |   |-- README.md
|   |   |-- __init__.py
|   |   |-- __main__.py
|   |   |-- __pycache__
|   |   |   |-- __init__.cpython-312.pyc
|   |   |   |-- hooks.cpython-312.pyc
|   |   |   |-- pipeline_registry.cpython-312.pyc
|   |   |   `-- settings.cpython-312.pyc
|   |   |-- hooks.py
|   |   |-- pipeline_registry.py
|   |   |-- pipelines
|   |   |   |-- __init__.py
|   |   |   |-- __pycache__
|   |   |   |   `-- __init__.cpython-312.pyc
|   |   |   `-- iris
|   |   |       |-- __init__.py
|   |   |       |-- __pycache__
|   |   |       |   |-- __init__.cpython-312.pyc
|   |   |       |   |-- nodes.cpython-312.pyc
|   |   |       |   `-- pipeline.cpython-312.pyc
|   |   |       |-- nodes.py
|   |   |       `-- pipeline.py
|   |   `-- settings.py
|   `-- iris_databricks.egg-info
|       |-- PKG-INFO
|       |-- SOURCES.txt
|       |-- dependency_links.txt
|       |-- entry_points.txt
|       |-- requires.txt
|       `-- top_level.txt
|-- tests
|   |-- __init__.py
|   |-- test_pipeline.py
|   `-- test_run.py
`-- uv.lock
j
As far as i can see, this is just the starter. I am a bit puzzled why this wouldn’t work. Could you perhaps share the
[project.scripts]
section of your
pyproject.toml
?
I unfortunately won’t have time to do a more thorough debugging session before next week, but I’m happy to help resolve this! 😊
n
yes this is the starter - i figured since youre much more familiar with this than my own pipeline it could be easier to debug here is the script
Copy code
[project.scripts]
"iris-databricks" = "iris_databricks.__main__:main"
no worries about working on this next week, appreciate the help 🙂