When I say `catalog.list()` in a kedro jupter lab ...
# questions
l
When I say
catalog.list()
in a kedro jupter lab instance, it doesn’t return all registered datasets. anything that’s using a dataset factory seems to be missing. Is there a way to infer those somehow?
a
The factory datasets are registered after they are first used during a session run. CHecking for their existence will also register them to the catalog ->
catalog.exists(dataset_name)
Copy code
for dataset in pipeline["__default__"].data_sets():
  catalog.exists(dataset)
And then catalog.list() should list them
l
that’s brilliant! Thank you
aha, and custom datasets will have to implement an
_exists
method for this to work. That’s quite useful!
a
It should work without the _exists method too since internally it just sees if the catalog entry exists, doesn’t call the datasets’s exists method
l
I am getting a
2023-11-15 10:42:25,539 - kedro.io.core - WARNING - 'exists()' not implemented for 'DataRobotProjectDataset'. Assuming output does not exist.
But it’s fine - I’m actually rather happy to implement a custom
exists
method
a
Ah, my bad. It does need it!
It’ll throw a warning but the dataset should still get registered
l
ah you’re right. It does appear in the list
cool cool - super helpful. Thank you 🙏
❤️ 1
n
@Lukas Innig What do you expected to see instead? factory is created during a pipeline run, so the result gonna depends on which pipeline you are running.
on the other hand,
catalog.list(pipeline=<name>)
is not possible because catalog is not aware of a Pipeline object. This is coordianted by the session/runner instead😶
(just brainstorming how can we make this a better experience out loud)
l
I guess I was expecting to see all the datasets that I specified in catalog.yml. to be honest I haven’t thought too deeply about it
👍🏼 1
n
I created this ticket to document the workaround. https://github.com/kedro-org/kedro/issues/3312 I can’t think of a nice solution just now but I will be keep this in mind.
🔥 1