*# Announcing `kedro-databricks` : A New Kedro Plu...
# plugins-integrations
j
# Announcing
kedro-databricks
: A New Kedro Plugin for Seamless Databricks Integration
šŸš€ Developing pipelines on Databricks just got a whole lot easier! I am excited to introduce
kedro-databricks
, a powerful new plugin designed to enhance your Kedro experience on Databricks. This plugin provides a streamlined, efficient, and developer-friendly approach to deploying and managing Kedro pipelines on the Databricks platform. ## Key Features: • Initialization: Transform your local Kedro project into a Databricks Asset Bundle project with a single command. • Generation: Effortlessly generate Asset Bundle resources definitions. • Deployment: Simplify the deployment of your Kedro projects to Databricks. # How to get started ## Prerequisites: Before you begin, ensure that the Databricks CLI is installed and configured. For more information on installation and configuration, please refer to the Databricks CLI documentation . • Installation Help • Configuration Help ## Creating a new project Before creating a new project, ensure you have installed Kedro into a virtual environment . Then use the following command:
Copy code
pip install kedro
Initialize a new Kedro project with the
databricks-iris
starter with the following command:
Copy code
kedro new --starter="databricks-iris"
After the project is created, navigate to the newly created project directory:
Copy code
cd <my-project-name>  # change directory
Install the required dependencies:
Copy code
pip install -r requirements.txt
pip install kedro-databricks
Now you can initialize the Databricks asset bundle
Copy code
kedro databricks init
Next, generate the Asset Bundle resources definition:
Copy code
kedro databricks bundle
Finally, deploy the Kedro project to Databricks:
Copy code
kedro databricks deploy
That's it! Your pipelines have now been deployed as a workflow to Databricks as ``[dev user] project_name``. Try running the workflow to see the results. You're all set to start developing your Kedro pipelines on Databricks. For more detailed information and documentation, visit the Github Repository.
šŸ”„ 9
K 10
🧱 9
j
amazing work @Jens Peder Meldgaard!! šŸ™ŒšŸ¼
y
CC: @Richard Purvis, @Yaroslav Starukhin
🤯 1
r
Super cool, will have to try it out!
d
HERO!
ā¤ļø 1
n
Hello @Jens Peder Meldgaard, thx for the pluggin! I've just tested it and got an exception after a long waiting time at the step :
Uploading bundle files to ...
Copy code
Exception: Deploying to Databricks: Command '['databricks', 'bundle', 'deploy', '--target', 'local']' returned non-zero exit status 1.
I can confirm the bundle is created in
Home/,bundle
but no workflow in
Workflows
j
Sent PM
p
Thank you @Jens Peder Meldgaard! I have been trying out your plugin today. It sucks that databricks limits to max of 100 tasks in a single job. I have more than 100 nodes in my default pipeline T.T which is why it is crashing...
šŸ˜… 2
j
Yeah, I didn't really test it out with pipelines that size
j
should we think of grouping the tasks then, like we do with Kedro-airflow? cc @Ankita Katiyar @Simon Brugman @datajoely
šŸ‘ 1
šŸ‘šŸ¼ 1
p
That would be a great add on
I was suggesting another feature to set cluster_id at the time of init in case people want to leverage an existing cluster.
j
A huddle started - Sorry misclicked
j
when I first tested the plugin I was able to set the cluster id by tweaking the config files
p
Yes, we can set the cluster id by tweaking the config files
Can probably also give an option to use
--pipeline
instead of
--nodes
. That can help in grouping tasks together. There's a downside but it's also an option that the user can benefit with.
j
opened an issue https://github.com/JenspederM/kedro-databricks/issues/32 I'd say let's continue the conversation there!
ā¤ļø 1
l
Thanks!