Can anyone suggest the best way to access 1 Catalog definiti Kedro #questions

Can anyone suggest the best way to access: 1. Cat...

Ben Shaughnessy

01/21/2025, 5:16 PM

Can anyone suggest the best way to access: 1. Catalog definition 2. Pipeline definition Before the pipeline runs, and ideally outside the normal

kedro run

life cycle? Im trying to accomplish two very different things with this 1. is trying to implicitly figure out which nodes depend on each other via memory datasets, to support using memory datasets in a distributed argo pipeline running a kedro pipeline 2. generate documentation via a mermaid diagram that I can store in a readme file. Similar to kedro viz (but with some subtle key features)

🧜‍♀️ 1

Hall

01/21/2025, 5:16 PM

Someone will reply to you shortly. In the meantime, this might help:

Juan Luis

01/21/2025, 5:24 PM

for 1. you can always instantiate the config loader and data catalog programmatically, see for example https://docs.kedro.org/en/stable/notebooks_and_ipython/notebook-example/add_kedro_to_a_notebook.html#use-kedro-s-configurat[…]-load-the-data-catalog

Juan Luis

01/21/2025, 5:25 PM

for 2., we've been thinking about that for a long time but there's nothing very solid yet... an early prototype was https://github.com/AlpAribal/kedro-inspect/ you might want to have a look

Nok Lam Chan

01/21/2025, 8:20 PM

Hmm for pipeline I have something like this that generate a pipeline ascii https://github.com/noklam/kedro-example/blob/master/ascii_hook%2Fsrc%2Fascii_hook%2Fdagascii.py Not sure if they still run I created this few years ago, but should not take too much to edit

Nok Lam Chan

01/21/2025, 8:21 PM

If you want to figure out which datasets is memory dataset, you can use kedro catalog create that fills all the missing dataset with memory dataset in catalog. If you want to do something differently, easiest way is probably take that logic and modified it as a new CLI or a new hook

3 Views

Open in Slack

Previous Next