Hi team! Is there any way to resolve factory data...
# questions
g
Hi team! Is there any way to resolve factory datasets and access them from a DataCatalog/KeroDataCatalog instance? I notice using the CLI to create a list of datasets
kedro catalog list
will automatically resolve them (for a given pipeline - see this bit of code) while doing
catalog.list()
in a kedro jupyter notebook will just list non-factory datasets (and parameters). Are those two returning different outputs by design or is it a bug? Thanks!
h
Someone will reply to you shortly. In the meantime, this might help:
g
For a bit of context, I noticed this while using vizro which has a kedro integration that relies on catalog.list. In practice, I would like to query by name a dataset defined by a factory dataset and get its load function.
d
So there is a trick for doing this before we fix this. Essentially .list() needs to match the patterns before they show up so you can do
catalog.list(Pipeline.inputs() | Pipeline.outputs())
K 1
g
Thanks @datajoely, it does not work out of the blue:
Copy code
from kedro.framework.project import pipelines
pipeline = pipelines.get("__default__")
catalog.list(pipeline.inputs() | pipeline.outputs())
returns
Copy code
AttributeError: 'set' object has no attribute 'strip'
Seems like regex_search is supposed to be a string? If I pass `regex_search=".*KWD.*", where KWD is part of one of my factored datasets, it also does not find it.
a
The factory datasets are lazy so they don’t show up in
catalog.list()
(Discussion in https://github.com/kedro-org/kedro/issues/3312) With the new catalog you can do -
Copy code
catalog["<dataset_name>"]
And it’ll resolve and get you the factory dataset
Copy code
for dataset in pipelines['__default__'].datasets():
  catalog.exists(dataset) # or catalog.get_dataset(dataset)

# now it'll show up
catalog.list()
🥳 1
K 1
g
Thank you @Ankita Katiyar! That's perfect. PS: I opened an issue to fix this on the vizro side.