Hello everyone I have been studying ways to use db...
# questions
m
Hello everyone I have been studying ways to use dbx using the kedro template. Have any of you gone through this?
y
Yes! I'm going to tag @Pedro Abreu and @poornima p here.
v
Hello, I've started trying this out the past few days. Have you ever evolved this yourself? I'm trying to find how a good workflow here but I haven't got there yet. I've managed to make it work, but it's not great yet
y
@Vitor Avancini What challenges are you running into? We've got a ticket in our backlog to document this workflow. It would be great if you could leave feedback here: https://github.com/kedro-org/kedro/issues/2185
v
Ok, I'll try to gather some thoughts and write there, thanks Yetunde!
One thing that I'm having some trouble is with the conf folder, when I run in databricks those dir are missing and execution fails, i've made it work by packaging the configs and fixing the conf dir at settings.py
馃憤 1
but it doesnt seem very ellegant
y
Ah! You're using
dbx deploy
? We're aware of this problem and it's actually been addressed and will be shipped in this sprint. Have a look at this GitHub issue: https://github.com/kedro-org/kedro/issues/1908
馃馃徔 1
v
nice, will take a look!
right now i'm using dbx execute for the development workflow
it works almost nicely, it takes sometime to install the package at databricks and it does the installation every run
if I remove the dependecies after the first run, it goes alot faster
thinking on maybe writing some cache-like check for the setup.py install
y
@Jannic Holzer is looking into
dbx
and Kedro at a later stage. So we're going to come back to you two. Has everything been okay?
v
I've managed to make it work, but it feels hacky at some points
for developing I went back to notebooks, I've created a notebook 'kedro_runner' which you run specifying your pipeline or any kedro run cli argument
I was using 'dbx run' for trying to keep everyhing at vscode, but it is a bit slow as we have to pacakge everything and install everytime at databricks cluster. Between when I tested this out and now databricks launched a vscode extension, but I haven't had the time to test it out
m
Understood, How do you maintain kedro scaffolding when you run dbx? The problem I am facing is that, when I do the kedro package the path conf or data are not respected inside the cluster Databricks.