https://kedro.org/ logo
#questions
Title
# questions
z

Zihao Xu

11/21/2022, 11:16 PM
Hi team, we are trying to use the experiment tracking feature of kedro within databricks, but are running into the following error:
Copy code
INFO     Loading data from 'modeling.model_best_params_' (JSONDataSet)...   ]8;id=949765;file:///databricks/python/lib/python3.8/site-packages/kedro/io/data_catalog.py\data_catalog.py]8;;\:]8;id=434875;file:///databricks/python/lib/python3.8/site-packages/kedro/io/data_catalog.py#343\343]8;;\

DataSetError: Loading not supported for 'JSONDataSet'
where we have the following catalog entry:
Copy code
modeling.model_best_params_:
  type: tracking.JSONDataSet
  filepath: "${folders.tracking}/model_best_params.json"
  layer: reporting
The same code runs completely fine locally, but is failing within data braicks. Could you please help us understand why?
Using
kedro, version 0.18.3
Also, in both our local test and databricks test, we use the same
requirements.txt
to create virtual envs.
d

datajoely

11/22/2022, 10:01 AM
Is your $folders.tracking local / dbfs / s3?
Is your $folders.tracking local / dbfs / s3?
z

Zihao Xu

11/22/2022, 2:50 PM
$folders.tracking
is a
dbfs
store on databricks, locally it’s within
data/09_tracking
d

datajoely

11/22/2022, 3:03 PM
and are you using databricks repos?
z

Zihao Xu

11/22/2022, 4:53 PM
Yes, I am using databricks repos
but the data IO is handled by dbfs. Everything was fine (i.e., saving / loading of data), until we brought in the
tracking.JSONDataset
d

datajoely

11/22/2022, 5:28 PM
Oh sorry I get what you’re trying to do
Copy code
DataSetError: Loading not supported for 'JSONDataSet'
you don’t load it
it’s for visualising in the Viz experimentation tracking view
not using as data
z

Zihao Xu

11/22/2022, 5:29 PM
yes, this dataset is used in viz for tracking purposes. But it is also an input to a node within our modeling pipeline - hence it needs to be loaded by that node
d

datajoely

11/22/2022, 5:29 PM
it’s not designed for that
z

Zihao Xu

11/22/2022, 5:29 PM
The interesting thing is, the same setup works totally fine locally..
d

datajoely

11/22/2022, 5:29 PM
you would have to duplicate the return and use the duplicate one
Copy code
DataSetError: Loading not supported for 'JSONDataSet'
The dataset isn’t going to behave differently on different deploys
z

Zihao Xu

11/22/2022, 5:30 PM
by “duplicate the return”, do you mean have a separate node just for tracking purposes
d

datajoely

11/22/2022, 5:30 PM
no just return the attribue twice in the same node
and label one of the outputs for experiment tracking
and the other as data references as an input to the next node
z

Zihao Xu

11/22/2022, 5:31 PM
I see. So one used as inputs for the next node (likely will be a memory dataset) and another just for tracking
👍 1
got it - will give this a try. Thanks @datajoely
n

Nok Lam Chan

11/22/2022, 9:40 PM
The dataset doesn’t support
load
, so it shouldn’t work for local
👍 1