Hi team, we are trying to use the experiment track...
# questions
z
Hi team, we are trying to use the experiment tracking feature of kedro within databricks, but are running into the following error:
Copy code
INFO     Loading data from 'modeling.model_best_params_' (JSONDataSet)...   ]8;id=949765;file:///databricks/python/lib/python3.8/site-packages/kedro/io/data_catalog.py\data_catalog.py]8;;\:]8;id=434875;file:///databricks/python/lib/python3.8/site-packages/kedro/io/data_catalog.py#343\343]8;;\

DataSetError: Loading not supported for 'JSONDataSet'
where we have the following catalog entry:
Copy code
modeling.model_best_params_:
  type: tracking.JSONDataSet
  filepath: "${folders.tracking}/model_best_params.json"
  layer: reporting
The same code runs completely fine locally, but is failing within data braicks. Could you please help us understand why?
Using
kedro, version 0.18.3
Also, in both our local test and databricks test, we use the same
requirements.txt
to create virtual envs.
d
Is your $folders.tracking local / dbfs / s3?
Is your $folders.tracking local / dbfs / s3?
z
$folders.tracking
is a
dbfs
store on databricks, locally it’s within
data/09_tracking
d
and are you using databricks repos?
z
Yes, I am using databricks repos
but the data IO is handled by dbfs. Everything was fine (i.e., saving / loading of data), until we brought in the
tracking.JSONDataset
d
Oh sorry I get what you’re trying to do
Copy code
DataSetError: Loading not supported for 'JSONDataSet'
you don’t load it
it’s for visualising in the Viz experimentation tracking view
not using as data
z
yes, this dataset is used in viz for tracking purposes. But it is also an input to a node within our modeling pipeline - hence it needs to be loaded by that node
d
it’s not designed for that
z
The interesting thing is, the same setup works totally fine locally..
d
you would have to duplicate the return and use the duplicate one
Copy code
DataSetError: Loading not supported for 'JSONDataSet'
The dataset isn’t going to behave differently on different deploys
z
by “duplicate the return”, do you mean have a separate node just for tracking purposes
d
no just return the attribue twice in the same node
and label one of the outputs for experiment tracking
and the other as data references as an input to the next node
z
I see. So one used as inputs for the next node (likely will be a memory dataset) and another just for tracking
👍 1
got it - will give this a try. Thanks @datajoely
n
The dataset doesn’t support
load
, so it shouldn’t work for local
👍 1