Juan David Patiño Guerra
12/13/2023, 9:48 AMdatajoely
12/13/2023, 10:06 AMfilepath does not exist or not accessible
at the very top of your screenshotdatajoely
12/13/2023, 10:06 AMdatajoely
12/13/2023, 10:07 AMJuan David Patiño Guerra
12/13/2023, 10:47 AMMichał Madej
12/13/2023, 11:20 AMMichał Madej
12/13/2023, 11:23 AMJuan David Patiño Guerra
12/13/2023, 12:53 PMspark_version: 13.3.x-cpu-ml-scala2.12
and I get the (new) error shown attached - I made screenshots of the full trace (even more cryptic). Any ideas? Thanks for your quick help so far guys!datajoely
12/13/2023, 12:59 PMspark.yaml
?Juan David Patiño Guerra
12/13/2023, 1:08 PMdatabricks.yaml
from the DABs? If so, I think that one needs to stay in the root of the repo.datajoely
12/13/2023, 1:09 PMvalue cannot be null for spark.app.name
datajoely
12/13/2023, 1:10 PMMichał Madej
12/13/2023, 2:40 PM/Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.target}/${bundle.name}
, try using it in parameters: ["--conf-source", "here", ...]
Juan David Patiño Guerra
12/14/2023, 9:45 AMSparkSession.builder.appName(context.__package_name_)
by SparkSession.builder.appName(context._project_path.name_)
, which is what is shown in the latest documentation. I do end up (again) in the same config error from above which says ValueError: Given configuration path either does not exist or is not a valid directory: /databricks/driver/conf/base
.
After diving in with more detail, I see that it gets into the MLFlow tracking hook, and fails when using the ConfigLoader (see traceback on the screenshot attached). The code that it is calling is the following:
class MLFlowTrackingHooks:
"""Namespace for grouping all model-tracking hooks with MLflow together."""
def load_parameters(self, run_params):
project_path = run_params["project_path"]
conf_loader = ConfigLoader(conf_source=f"{project_path}/{settings.CONF_SOURCE}")
parameters = conf_loader.get("parameters*", "parameters*/**")
return parameters
@hook_impl
def before_pipeline_run(self, run_params: Dict[str, Any]) -> None:
"""Hook implementation to start an MLflow run
with the session_id of the Kedro pipeline run.
"""
parameters = self.load_parameters(run_params)
experiment_name = parameters["mlflow_experiment_name"]
mlflow.set_experiment(experiment_name)
exp_id = mlflow.get_experiment_by_name(experiment_name).experiment_id
mlflow.start_run(run_name=run_params["session_id"], experiment_id=exp_id)
mlflow.log_params(run_params)
Does the type of config loader has something to do here? I'm still using DBR 13.3 LTS ML. Thanks again for thinking along.Juan David Patiño Guerra
12/18/2023, 10:46 AMhooks.py
. Because this project path is in databricks/driver
folder in databricks jobs , it was failing there. The solution was to point the ConfigLoader to look for the config folder in the dbfs that I copied it to when I had to run the pipeline.
After that it works! I hope that with the development and growth of Kedro with deployment as databricks jobs gets better and smoother!datajoely
12/18/2023, 10:52 AMdatajoely
12/18/2023, 10:52 AMJuan David Patiño Guerra
12/18/2023, 11:14 AMValueError: Given configuration path either does not exist or is not a valid directory: /databricks/driver/conf/base
. Is just that seeing it fail in that directory "felt" complex, but in the trace you could see that it was going through the hooks. It is a bit of a Databricks extra complexity that didn't help.datajoely
12/18/2023, 12:12 PMJuan David Patiño Guerra
12/18/2023, 12:23 PM/dbfs/FileStore/wine_model_kedro/conf/