https://kedro.org/ logo
#questions
Title
# questions
m

Manilson Ant贸nio Lussati

12/09/2022, 2:19 AM
Hello everyone I have been studying ways to use dbx using the kedro template. Have any of you gone through this?
y

Yetunde

12/09/2022, 9:30 AM
Yes! I'm going to tag @Pedro Abreu and @poornima p here.
v

Vitor Avancini

01/31/2023, 1:17 PM
Hello, I've started trying this out the past few days. Have you ever evolved this yourself? I'm trying to find how a good workflow here but I haven't got there yet. I've managed to make it work, but it's not great yet
y

Yetunde

01/31/2023, 1:28 PM
@Vitor Avancini What challenges are you running into? We've got a ticket in our backlog to document this workflow. It would be great if you could leave feedback here: https://github.com/kedro-org/kedro/issues/2185
v

Vitor Avancini

01/31/2023, 1:42 PM
Ok, I'll try to gather some thoughts and write there, thanks Yetunde!
One thing that I'm having some trouble is with the conf folder, when I run in databricks those dir are missing and execution fails, i've made it work by packaging the configs and fixing the conf dir at settings.py
馃憤 1
but it doesnt seem very ellegant
y

Yetunde

01/31/2023, 1:48 PM
Ah! You're using
dbx deploy
? We're aware of this problem and it's actually been addressed and will be shipped in this sprint. Have a look at this GitHub issue: https://github.com/kedro-org/kedro/issues/1908
馃馃徔 1
v

Vitor Avancini

01/31/2023, 1:48 PM
nice, will take a look!
right now i'm using dbx execute for the development workflow
it works almost nicely, it takes sometime to install the package at databricks and it does the installation every run
if I remove the dependecies after the first run, it goes alot faster
thinking on maybe writing some cache-like check for the setup.py install
y

Yetunde

02/28/2023, 3:40 PM
@Jannic Holzer is looking into
dbx
and Kedro at a later stage. So we're going to come back to you two. Has everything been okay?
v

Vitor Avancini

02/28/2023, 5:11 PM
I've managed to make it work, but it feels hacky at some points
for developing I went back to notebooks, I've created a notebook 'kedro_runner' which you run specifying your pipeline or any kedro run cli argument
I was using 'dbx run' for trying to keep everyhing at vscode, but it is a bit slow as we have to pacakge everything and install everytime at databricks cluster. Between when I tested this out and now databricks launched a vscode extension, but I haven't had the time to test it out
m

Manilson Ant贸nio Lussati

03/01/2023, 10:39 PM
Understood, How do you maintain kedro scaffolding when you run dbx? The problem I am facing is that, when I do the kedro package the path conf or data are not respected inside the cluster Databricks.
4 Views