Question on Kedro with Jupyter Notebook. We have pipelines already built and we want to prepare notebooks as a guide for CSTs and other Data Scientists. Ideally we would like to have a notebook running nodes one by one, showing significant outputs along the way, step by step.
What I found in Kedro resources were rather Notebook to Kedro pipeline guides, and what I need is rather the other way around.
Also big limitation is the fact, that I can only run pipeline once, if I do not want to restart notebook kernel. So it is impossible to run first node, then pause, look at outputs, run the second node, etc. Only option is to run full pipeline beforehand and only show intermediate outputs afterwards, without corresponding snippets of code running.
All things considered, it seems to me that the best solution here would be to import nodes and using access to the catalog.yaml run nodes one by one as functions, providing all arguments in a notebook.
Please, let me know if you had similar challenge and if you found some better solutions. 🙂
11/14/2022, 12:45 PM
Break your tasks into small pipelines. And use kedro's sequential runner manually. Should resolve almost everything you need.
11/14/2022, 2:51 PM
This is also open source world - csts don’t exist here!
Nok Lam Chan
11/15/2022, 9:52 AM
Quick question, why do you need to restart kernel?
Are you aware of the %reload_kedro command? you can use session to run node by node