https://kedro.org/ logo
#questions
Title
# questions
j

Juan David Patiño Guerra

12/13/2023, 9:48 AM
Hi all, I am running Kedro to run as a Job in Databricks. I am getting the error in the screenshot attached. It tries to find the configuration in databricks/driver/conf/base, but it is passed (correctly I hope) in the sysargs as displayed in the second attachment. I'm using Databricks Asset Bundles to run it (is like dbx but officially supported by Databricks). Thanks in advance for your help! FYI I have been asking related questions on this side as well: https://kedro-org.slack.com/archives/C03RKP2LW64/p1702289803851329
🧱 1
d

datajoely

12/13/2023, 10:06 AM
So DABs are very new but it says that the
filepath does not exist or not accessible
at the very top of your screenshot
so we just need to work out what it can see at that working directory
perhaps a little script that prints out the cwd or even the file tree will help diagnose what the filepath needs to be
j

Juan David Patiño Guerra

12/13/2023, 10:47 AM
This is probably where it is trying to look (see attachment). But I wonder what is happening in the background that it tries to look for the config there, instead of the location I pointed to. I have tried to follow the steps here as much as possible, so I would hope it doesn't fail: https://docs.kedro.org/en/0.18.14/deployment/databricks/databricks_deployment_workflow.html On the other hand I believe DABs work pretty similar to dbx in the background, so I would hope this is not the issue.
m

Michał Madej

12/13/2023, 11:20 AM
You're using older runtime (below or equal 13.0), you may need to add "experimental: python_wheel_wrapper: true" to the top of your databricks.yml file
j

Juan David Patiño Guerra

12/13/2023, 12:53 PM
Thanks @Michał Madej,I tried the "experimental: python_wheel_wrapper: true" proposed in the link but didn't work. I upgraded to
spark_version: 13.3.x-cpu-ml-scala2.12
and I get the (new) error shown attached - I made screenshots of the full trace (even more cryptic). Any ideas? Thanks for your quick help so far guys!
d

datajoely

12/13/2023, 12:59 PM
have you packaged your
spark.yaml
?
j

Juan David Patiño Guerra

12/13/2023, 1:08 PM
not really... and I'm not aware of having to do that from the instructions. I believe you mean
databricks.yaml
from the DABs? If so, I think that one needs to stay in the root of the repo.
d

datajoely

12/13/2023, 1:09 PM
value cannot be null for spark.app.name
so Kedro spark app name is usually defined in the SparkHooks
m

Michał Madej

12/13/2023, 2:40 PM
I don't know about spark, but DAB uploads your configuration directory to this location
/Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.target}/${bundle.name}
, try using it in
parameters: ["--conf-source", "here", ...]
j

Juan David Patiño Guerra

12/18/2023, 10:46 AM
Hi here, I just wanted to share the solution that I found in the end. The issue was happening due to the ConfigLoader trying to read the config folder from the default location close at the project root, this was happening in an mlflow hook in
hooks.py
. Because this project path is in
databricks/driver
folder in databricks jobs , it was failing there. The solution was to point the ConfigLoader to look for the config folder in the dbfs that I copied it to when I had to run the pipeline. After that it works! I hope that with the development and growth of Kedro with deployment as databricks jobs gets better and smoother!
d

datajoely

12/18/2023, 10:52 AM
That’s super helpful thanks for the update
I’d really like to think about how we could provide a better error message
j

Juan David Patiño Guerra

12/18/2023, 11:14 AM
I think I just had to look properly at the error message to see that it was jumping into the hook and failing there. I think it is just the different layers that made it a bit distracting to catch that one up. In the end this was true the whole time:
ValueError: Given configuration path either does not exist or is not a valid directory: /databricks/driver/conf/base
. Is just that seeing it fail in that directory "felt" complex, but in the trace you could see that it was going through the hooks. It is a bit of a Databricks extra complexity that didn't help.
d

datajoely

12/18/2023, 12:12 PM
and for reference what was the correct directory?
j

Juan David Patiño Guerra

12/18/2023, 12:23 PM
I used this location in dbfs to copy the config file so it could be read :
/dbfs/FileStore/wine_model_kedro/conf/
👍 1