# Kedro Databricks now supports Databricks Free Ed...
# plugins-integrations
j
# Kedro Databricks now supports Databricks Free Edition. In version
0.14.0
, a major refactor has been made to the
kedro-databricks
plugin, which means that we now support Databricks Free Edition out of the box. - Repository - Documentation ## Breaking changes To add support Databricks Free Edition, a breaking change was necessary, notably that the way resource overrides are defined had to be changed to accommodate overriding more resources than just jobs. Therefore, any current users wanting to upgrade to version
0.14.0
or above must perform the following changes to any overrides specified in
conf/databricks*
or `conf/**/databricks*`: Overrides before:
Copy code
default:
    environments:
        - environment_key: default
    spec:
        environment_version: '4'
        dependencies:
            - ../dist/*.whl
    tasks:
        - task_key: default
          environment_key: default
Overrides after:
Copy code
resources:
    jobs:
        default:
            environments:
                - environment_key: default
            spec:
                environment_version: '4'
                dependencies:
                    - ../dist/*.whl
            tasks:
                - task_key: default
                environment_key: default
## Test it out for yourself: NOTE: running this example requires permissions to create/delete jobs and volumes Create a kedro project based on the
databricks-iris
starter with the following command:
Copy code
bash
kedro new --starter="databricks-iris"
After the project is created, navigate to the newly created project directory:
Copy code
bash
cd <project-name>  # change directory
Install the required dependencies:
Copy code
bash
pip install -r requirements.txt
pip install kedro-databricks
#### Initializing the Databricks Asset Bundle Now you can initialize the Databricks asset bundle
Copy code
bash
kedro databricks init
The plugin loads all configuration named according to
conf/databricks*
or
conf/databricks/*
. #### Generating bundle resources Once you have initialized the Databricks Asset Bundle, you can generate the Asset Bundle resources definition. This step is necessary to prepare your Kedro project for deployment to Databricks. Run the following command:
Copy code
bash
kedro databricks bundle
This command will generate the following files:
Copy code
├── resources/
│   ├── target.<env>.<resource-type>.<resource-name>.yml  # We support any databricks resource type
│   ├── target.<env>.jobs.<project>.yml                   # corresponds to `kedro run`
│   ├── target.<env>.jobs.<project>_<pipeline>.yml        # corresponds to `kedro run --pipeline <pipeline>`
#### Deploying to Databricks With your Kedro project initialized and the Asset Bundle resources generated, you can now deploy your Kedro project to Databricks. Run the following command:
Copy code
bash
kedro databricks deploy
When deployed, you will see a summary of the deployed resources in your terminal, similar to:
Copy code
Name: <project-name>
Target: dev
Workspace:
  Host: https://<workspace-id>.<http://cloud.databricks.com|cloud.databricks.com>
  User: <email>
  Path: /Workspace/Users/<email>/.bundle/<project-name>/dev
Resources:
  Jobs:
    <project-name>:
      Name: [dev <username>] <project-name>
      URL:  https://<workspace-id>.<http://cloud.databricks.com/jobs/|cloud.databricks.com/jobs/>...
    <project-name>_iris:
      Name: [dev <username>] <project-name>_iris
      URL:  https://<workspace-id>.<http://cloud.databricks.com/jobs/|cloud.databricks.com/jobs/>...
  Volumes:
    <project-name>_volume:
      Name: <username>
      URL:  https://<workspace-id>.<http://cloud.databricks.com/explore/data/volumes/workspace/default/|cloud.databricks.com/explore/data/volumes/workspace/default/>...
That's it! Your pipelines have now been deployed to Databricks. #### Running the job To run the job on Databricks, you can use the following command:
Copy code
bash
kedro databricks run <project_name>
It might take a few minutes to run the job, depending on the size of your dataset and the complexity of your pipelines. While you wait, you can monitor the progress of your job in the Databricks UI. #### Cleaning up resources To clean up the resources created by the plugin, you can use the following command:
Copy code
bash
kedro databricks destroy
This command will remove the Databricks Asset Bundle configuration and any resources created during the deployment process. It is a good practice to clean up resources when they are no longer needed to avoid unnecessary costs.
🚀 3
🧱 3
❤️ 4
🔥 5
🥳 1