Jens Peder Meldgaard
07/19/2024, 4:57 PMkedro-databricks
: A New Kedro Plugin for Seamless Databricks Integration š
Developing pipelines on Databricks just got a whole lot easier! I am excited to introduce kedro-databricks
, a powerful new plugin designed to enhance your Kedro experience on Databricks. This plugin provides a streamlined, efficient, and developer-friendly approach to deploying and managing Kedro pipelines on the Databricks platform.
## Key Features:
⢠Initialization: Transform your local Kedro project into a Databricks Asset Bundle project with a single command.
⢠Generation: Effortlessly generate Asset Bundle resources definitions.
⢠Deployment: Simplify the deployment of your Kedro projects to Databricks.
# How to get started
## Prerequisites:
Before you begin, ensure that the Databricks CLI is installed and configured. For more information on installation and configuration, please refer to the Databricks CLI documentation .
⢠Installation Help
⢠Configuration Help
## Creating a new project
Before creating a new project, ensure you have installed Kedro into a virtual environment . Then use the following command:
pip install kedro
Initialize a new Kedro project with the databricks-iris
starter with the following command:
kedro new --starter="databricks-iris"
After the project is created, navigate to the newly created project directory:
cd <my-project-name> # change directory
Install the required dependencies:
pip install -r requirements.txt
pip install kedro-databricks
Now you can initialize the Databricks asset bundle
kedro databricks init
Next, generate the Asset Bundle resources definition:
kedro databricks bundle
Finally, deploy the Kedro project to Databricks:
kedro databricks deploy
That's it! Your pipelines have now been deployed as a workflow to Databricks as ``[dev user] project_name``. Try running the workflow to see the results.
You're all set to start developing your Kedro pipelines on Databricks. For more detailed information and documentation, visit the Github Repository.Juan Luis
07/19/2024, 5:25 PMYury Fedotov
07/19/2024, 5:35 PMRichard Purvis
07/19/2024, 5:39 PMdatajoely
07/19/2024, 9:45 PMNicolas P
07/24/2024, 9:36 AMUploading bundle files to ...
Exception: Deploying to Databricks: Command '['databricks', 'bundle', 'deploy', '--target', 'local']' returned non-zero exit status 1.
I can confirm the bundle is created in Home/,bundle
but no workflow in Workflows
Jens Peder Meldgaard
07/24/2024, 9:48 AMPuneet Saini
07/25/2024, 11:49 AMJens Peder Meldgaard
07/25/2024, 2:27 PMJuan Luis
07/25/2024, 3:18 PMPuneet Saini
07/25/2024, 3:19 PMPuneet Saini
07/25/2024, 3:22 PMJens Peder Meldgaard
07/25/2024, 4:18 PMJuan Luis
07/25/2024, 6:28 PMPuneet Saini
07/25/2024, 6:53 PMPuneet Saini
07/29/2024, 11:20 AM--pipeline
instead of --nodes
. That can help in grouping tasks together. There's a downside but it's also an option that the user can benefit with.Juan Luis
07/29/2024, 11:35 AMLeonardo David Treiger Herszenhaut Brettas
08/10/2024, 5:27 PM