Can anyone suggest the best way to access: 1. Cat...
# questions
b
Can anyone suggest the best way to access: 1. Catalog definition 2. Pipeline definition Before the pipeline runs, and ideally outside the normal
kedro run
life cycle? Im trying to accomplish two very different things with this 1. is trying to implicitly figure out which nodes depend on each other via memory datasets, to support using memory datasets in a distributed argo pipeline running a kedro pipeline 2. generate documentation via a mermaid diagram that I can store in a readme file. Similar to kedro viz (but with some subtle key features)
🧜‍♀️ 1
h
Someone will reply to you shortly. In the meantime, this might help:
j
for 1. you can always instantiate the config loader and data catalog programmatically, see for example https://docs.kedro.org/en/stable/notebooks_and_ipython/notebook-example/add_kedro_to_a_notebook.html#use-kedro-s-configurat[…]-load-the-data-catalog
for 2., we've been thinking about that for a long time but there's nothing very solid yet... an early prototype was https://github.com/AlpAribal/kedro-inspect/ you might want to have a look
n
Hmm for pipeline I have something like this that generate a pipeline ascii https://github.com/noklam/kedro-example/blob/master/ascii_hook%2Fsrc%2Fascii_hook%2Fdagascii.py Not sure if they still run I created this few years ago, but should not take too much to edit
If you want to figure out which datasets is memory dataset, you can use kedro catalog create that fills all the missing dataset with memory dataset in catalog. If you want to do something differently, easiest way is probably take that logic and modified it as a new CLI or a new hook