Hi team Is there any way to resolve factory datasets and acc Kedro #questions

Hi team! Is there any way to resolve factory data...

Guillaume Tauzin

02/04/2025, 8:36 AM

Hi team! Is there any way to resolve factory datasets and access them from a DataCatalog/KeroDataCatalog instance? I notice using the CLI to create a list of datasets

kedro catalog list

will automatically resolve them (for a given pipeline - see this bit of code) while doing

catalog.list()

in a kedro jupyter notebook will just list non-factory datasets (and parameters). Are those two returning different outputs by design or is it a bug? Thanks!

Hall

02/04/2025, 8:36 AM

Someone will reply to you shortly. In the meantime, this might help:

Guillaume Tauzin

02/04/2025, 8:38 AM

For a bit of context, I noticed this while using vizro which has a kedro integration that relies on catalog.list. In practice, I would like to query by name a dataset defined by a factory dataset and get its load function.

datajoely

02/04/2025, 9:06 AM

So there is a trick for doing this before we fix this. Essentially .list() needs to match the patterns before they show up so you can do

catalog.list(Pipeline.inputs() | Pipeline.outputs())

K 1

Guillaume Tauzin

02/04/2025, 9:16 AM

Thanks @datajoely, it does not work out of the blue:

Copy code

from kedro.framework.project import pipelines
pipeline = pipelines.get("__default__")
catalog.list(pipeline.inputs() | pipeline.outputs())

returns

Copy code

AttributeError: 'set' object has no attribute 'strip'

Seems like regex_search is supposed to be a string? If I pass `regex_search=".*KWD.*", where KWD is part of one of my factored datasets, it also does not find it.

Ankita Katiyar

02/04/2025, 10:33 AM

The factory datasets are lazy so they don’t show up in

catalog.list()

(Discussion in https://github.com/kedro-org/kedro/issues/3312) With the new catalog you can do -

Copy code

catalog["<dataset_name>"]

And it’ll resolve and get you the factory dataset

Copy code

for dataset in pipelines['__default__'].datasets():
  catalog.exists(dataset) # or catalog.get_dataset(dataset)

# now it'll show up
catalog.list()

🥳 1

K 1

Guillaume Tauzin

02/04/2025, 1:36 PM

Thank you @Ankita Katiyar! That's perfect. PS: I opened an issue to fix this on the vizro side.

2 Views

Open in Slack

Previous Next