Hi team we are trying to use the experiment tracking feature Kedro #questions

Hi team, we are trying to use the experiment track...

Zihao Xu

11/21/2022, 11:16 PM

Hi team, we are trying to use the experiment tracking feature of kedro within databricks, but are running into the following error:

Copy code

INFO     Loading data from 'modeling.model_best_params_' (JSONDataSet)...   ]8;id=949765;file:///databricks/python/lib/python3.8/site-packages/kedro/io/data_catalog.py\data_catalog.py]8;;\:]8;id=434875;file:///databricks/python/lib/python3.8/site-packages/kedro/io/data_catalog.py#343\343]8;;\

DataSetError: Loading not supported for 'JSONDataSet'

where we have the following catalog entry:

Copy code

modeling.model_best_params_:
  type: tracking.JSONDataSet
  filepath: "${folders.tracking}/model_best_params.json"
  layer: reporting

The same code runs completely fine locally, but is failing within data braicks. Could you please help us understand why?

Zihao Xu

11/21/2022, 11:25 PM

Using

kedro, version 0.18.3

Zihao Xu

11/21/2022, 11:40 PM

Also, in both our local test and databricks test, we use the same

requirements.txt

to create virtual envs.

datajoely

11/22/2022, 10:01 AM

~~Is your $folders.tracking local / dbfs / s3?~~

datajoely

11/22/2022, 10:01 AM

Is your $folders.tracking local / dbfs / s3?

Zihao Xu

11/22/2022, 2:50 PM

$folders.tracking

is a

dbfs

store on databricks, locally it’s within

data/09_tracking

datajoely

11/22/2022, 3:03 PM

and are you using databricks repos?

Zihao Xu

11/22/2022, 4:53 PM

Yes, I am using databricks repos

Zihao Xu

11/22/2022, 4:54 PM

but the data IO is handled by dbfs. Everything was fine (i.e., saving / loading of data), until we brought in the

tracking.JSONDataset

…

datajoely

11/22/2022, 5:28 PM

Oh sorry I get what you’re trying to do

datajoely

11/22/2022, 5:28 PM

Copy code

DataSetError: Loading not supported for 'JSONDataSet'

datajoely

11/22/2022, 5:28 PM

you don’t load it

datajoely

11/22/2022, 5:28 PM

it’s for visualising in the Viz experimentation tracking view

datajoely

11/22/2022, 5:28 PM

not using as data

Zihao Xu

11/22/2022, 5:29 PM

yes, this dataset is used in viz for tracking purposes. But it is also an input to a node within our modeling pipeline - hence it needs to be loaded by that node

datajoely

11/22/2022, 5:29 PM

it’s not designed for that

Zihao Xu

11/22/2022, 5:29 PM

The interesting thing is, the same setup works totally fine locally..

datajoely

11/22/2022, 5:29 PM

you would have to duplicate the return and use the duplicate one

datajoely

11/22/2022, 5:30 PM

Copy code

DataSetError: Loading not supported for 'JSONDataSet'

The dataset isn’t going to behave differently on different deploys

Zihao Xu

11/22/2022, 5:30 PM

by “duplicate the return”, do you mean have a separate node just for tracking purposes

datajoely

11/22/2022, 5:30 PM

no just return the attribue twice in the same node

datajoely

11/22/2022, 5:30 PM

and label one of the outputs for experiment tracking

datajoely

11/22/2022, 5:31 PM

and the other as data references as an input to the next node

Zihao Xu

11/22/2022, 5:31 PM

I see. So one used as inputs for the next node (likely will be a memory dataset) and another just for tracking

👍 1

Zihao Xu

11/22/2022, 5:31 PM

got it - will give this a try. Thanks @datajoely

Nok Lam Chan

11/22/2022, 9:40 PM

The dataset doesn’t support

load

, so it shouldn’t work for local

👍 1

6 Views

Open in Slack

Previous Next