Hello everyone I have been studying ways to use dbx using th Kedro #questions

Join Slack

Hello everyone I have been studying ways to use db...

# questions

Manilson António Lussati

12/09/2022, 2:19 AM

Hello everyone I have been studying ways to use dbx using the kedro template. Have any of you gone through this?

Yetunde

12/09/2022, 9:30 AM

Yes! I'm going to tag @Pedro Abreu and @poornima p here.

Vitor Avancini

01/31/2023, 1:17 PM

Hello, I've started trying this out the past few days. Have you ever evolved this yourself? I'm trying to find how a good workflow here but I haven't got there yet. I've managed to make it work, but it's not great yet

Yetunde

01/31/2023, 1:28 PM

@Vitor Avancini What challenges are you running into? We've got a ticket in our backlog to document this workflow. It would be great if you could leave feedback here: https://github.com/kedro-org/kedro/issues/2185

Vitor Avancini

01/31/2023, 1:42 PM

Ok, I'll try to gather some thoughts and write there, thanks Yetunde!

Vitor Avancini

01/31/2023, 1:46 PM

One thing that I'm having some trouble is with the conf folder, when I run in databricks those dir are missing and execution fails, i've made it work by packaging the configs and fixing the conf dir at settings.py

👍 1

Vitor Avancini

01/31/2023, 1:46 PM

but it doesnt seem very ellegant

Yetunde

01/31/2023, 1:48 PM

Ah! You're using

dbx deploy

? We're aware of this problem and it's actually been addressed and will be shipped in this sprint. Have a look at this GitHub issue: https://github.com/kedro-org/kedro/issues/1908

🤟🏿 1

Vitor Avancini

01/31/2023, 1:48 PM

nice, will take a look!

Vitor Avancini

01/31/2023, 1:49 PM

right now i'm using dbx execute for the development workflow

Vitor Avancini

01/31/2023, 1:49 PM

it works almost nicely, it takes sometime to install the package at databricks and it does the installation every run

Vitor Avancini

01/31/2023, 1:50 PM

if I remove the dependecies after the first run, it goes alot faster

Vitor Avancini

01/31/2023, 1:50 PM

thinking on maybe writing some cache-like check for the setup.py install

Yetunde

02/28/2023, 3:40 PM

@Jannic Holzer is looking into

dbx

and Kedro at a later stage. So we're going to come back to you two. Has everything been okay?

Vitor Avancini

02/28/2023, 5:11 PM

I've managed to make it work, but it feels hacky at some points

Vitor Avancini

02/28/2023, 5:12 PM

for developing I went back to notebooks, I've created a notebook 'kedro_runner' which you run specifying your pipeline or any kedro run cli argument

Vitor Avancini

02/28/2023, 5:13 PM

I was using 'dbx run' for trying to keep everyhing at vscode, but it is a bit slow as we have to pacakge everything and install everytime at databricks cluster. Between when I tested this out and now databricks launched a vscode extension, but I haven't had the time to test it out

Manilson António Lussati

03/01/2023, 10:39 PM

Understood, How do you maintain kedro scaffolding when you run dbx? The problem I am facing is that, when I do the kedro package the path conf or data are not respected inside the cluster Databricks.

4 Views

Open in Slack

Previous Next