Giovanna Cavali
04/03/2024, 3:30 PMDeepyaman Datta
04/03/2024, 3:30 PMJuan Luis
04/03/2024, 3:31 PMdatajoely
04/03/2024, 3:32 PMGiovanna Cavali
04/03/2024, 3:36 PMGiovanna Cavali
04/03/2024, 3:36 PMdatajoely
04/03/2024, 3:37 PMdatajoely
04/03/2024, 3:37 PMGiovanna Cavali
04/03/2024, 3:41 PMNok Lam Chan
04/03/2024, 3:42 PMAnd the idea was to track performance & plots using kedro-viz.
Giovanna Cavali
04/03/2024, 3:44 PMNok Lam Chan
04/03/2024, 3:45 PMdatajoely
04/03/2024, 3:45 PMafter_pipeline_run
hook…
https://docs.jupyter.org/en/latest/running.html#using-a-command-line-interfaceGiovanna Cavali
04/03/2024, 3:47 PMGiovanna Cavali
04/03/2024, 3:48 PMNok Lam Chan
04/03/2024, 3:53 PMGiovanna Cavali
04/03/2024, 3:54 PMNok Lam Chan
04/03/2024, 3:54 PMIñigo Hidalgo
04/03/2024, 3:54 PMNok Lam Chan
04/03/2024, 3:55 PMsession.run
and put that at the top. You should only run it when needed tho.Nok Lam Chan
04/03/2024, 3:56 PM# Cell 1
%load_ext kedro.ipython
session.run(pipeline="abc")
# cell 2
catalog.load("my_plot") # latest version by default
# cell 3
catalog.load("my_plot2") etc
The documentation of using Kedro in notebook may help: https://docs.kedro.org/en/stable/notebooks_and_ipython/kedro_and_notebooks.htmlGiovanna Cavali
04/03/2024, 4:02 PMArtur Dobrogowski
04/03/2024, 5:26 PMFlorian d
04/03/2024, 5:43 PMGiovanna Cavali
04/03/2024, 5:55 PMFlorian d
04/03/2024, 5:58 PMdatajoely
04/03/2024, 5:59 PMNok Lam Chan
04/03/2024, 5:59 PMFlorian d
04/03/2024, 6:00 PMFlorian d
04/03/2024, 6:00 PMGiovanna Cavali
04/03/2024, 6:01 PMArtur Dobrogowski
04/03/2024, 6:02 PMNok Lam Chan
04/03/2024, 6:04 PMNok Lam Chan
04/03/2024, 6:05 PMArtur Dobrogowski
04/03/2024, 6:07 PMArtur Dobrogowski
04/03/2024, 6:08 PMArtur Dobrogowski
04/03/2024, 6:08 PMArtur Dobrogowski
04/03/2024, 6:09 PMNok Lam Chan
04/03/2024, 6:09 PMOne of the things we also wanted, was to save some plots, printouts from the notebook as catalog entries.... so we can keep track of metrics/plots...I wonder why you need to save plots from the notebook, is it because someone need to run the code from the notebook instead?
Giovanna Cavali
04/03/2024, 6:13 PMNok Lam Chan
04/03/2024, 6:17 PMFlorian d
04/03/2024, 6:19 PMBut maybe we could build the metrics & plots we want to track in separate nodes and leave the notebook run as a hook at the end of the pipeline...My view is to do that in the notebook. Because otherwise you don’t gain anything if you load the pre-generated metrics plus plots in the notebook as you’d have to rerun the pipeline to get those. So I would do all the heavy lifting that needs to happen in the pipeline there and “most” of the reporting stuff in the notebook.
Florian d
04/03/2024, 6:20 PMNok Lam Chan
04/03/2024, 6:20 PM%load_node
that may helps retrieving node logic in a notebook.Artur Dobrogowski
04/03/2024, 6:21 PM%load_node
is that you need to have all of its inputs defined as loadable by data catalog, can't rely on memory datasetsNok Lam Chan
04/03/2024, 6:22 PM%load_node
and have problem with it or you like it, just tag me)
It's impossible to load something in memory because by the time you open a notebook it's not there already. What we can do is to figure that what exactly you need to re-run to have those dataset in memory again. You also don't want to keep everything in memory, so in reality you will have different checkpoints in your big pipeline, so the re-run part will be minimal)Florian d
04/03/2024, 6:23 PMGiovanna Cavali
04/03/2024, 6:31 PMGiovanna Cavali
04/10/2024, 4:52 PMnbconvert
) with the path as input and no ouput.
The path to the notebook is set in the parameter's yaml file. And we wanted to know if there is a way to use the path from catalog.yaml file.
We wanted to avoid to have paths in parameter.ymlGiovanna Cavali
04/10/2024, 4:52 PMnbconvert
) with the path as input and no ouput.
The path to the notebook is set in the parameter's yaml file. And we wanted to know if there is a way to use the path from catalog.yaml file.
We wanted to avoid to have paths in parameter.ymlJuan Luis
04/10/2024, 5:13 PMdef create_pipeline():
return pipeline([
node(
func=render_notebook_with_nbconvert,
inputs=["params:notebook_0_filepath"],
outputs=None,
)
])
am I right?
one way you can do it, even if it's a bit unorthodox, is to define your own dataset:
# catalog.yml
notebook0:
type: ipynb_datasets.IPYNBDataset
filepath: notebooks/notebook0.ipynb
and then you'd need to define a custom dataset
@dataclass
class IPYNBNotebook:
filepath: str
class IPYNBDataset(AbstractDataset):
def __init__(self, filepath: str):
self._filepath = filepath
def _load(self):
return IPYNBNotebook(self._filepath)
...
and then your node would do
def render_notebook_with_nbconvert(notebook: IPYNBNotebook):
return nbconvert.render(notebook.filepath)
all of this is pseudocode but I hope it makes sense. see https://docs.kedro.org/en/stable/extend_kedro/custom_datasets.html for more info on custom datasetsGiovanna Cavali
04/10/2024, 5:27 PMGiovanna Cavali
04/16/2024, 1:37 PMJuan Luis
04/16/2024, 1:40 PMJuan Luis
04/16/2024, 1:41 PMGiovanna Cavali
04/16/2024, 1:56 PMGiovanna Cavali
04/16/2024, 1:56 PMJuan Luis
04/16/2024, 1:57 PMGiovanna Cavali
04/16/2024, 5:01 PM