Hi team, So I am running a notebook as part of a ...
# questions
g
Hi team, So I am running a notebook as part of a kedro pipeline (using nbconvert) and the notebook loads kedro context and saves metrics to catalog. It works well but the metrics are not showing in kedro viz experiment tracking. I think it is because the timestamp is different when we compare the entries of the regular kedro nodes and the metrics saved in the notebook. Any ideas here on how to solve this?
n
It need to be the same Kedro session in order to use Kedro's experiment tracking. Did you create a separate session to run the notebook?
g
how can we make sure we have the same session?
I am actually just using
%load_ext kedro.ipython
n
Is it possible to pass in anything to nbconvert instead of re-creating a new session?
g
Hum, not sure. If we use papermill, we can input parameters.
Is there a way to load a specific session in the jupyter notebook?
n
not that I am aware of
You may be able to create the KedroSession manually and force the session_id to be the same
But either way you need to pass in some information into the notebook
g
yes, makes sense!!
any documentation on how we can force the session_id? once we are loading the context in the notebook?
and actually how we can extract the session_id from the current run?
n
I don't think there is any documentation on this, it's is more like a hack than an official API.
The session is protected by design and ensure that it's always unique.
g
Got it!! thank you for the help!!! let's see what we can do
Hello team, Going back to this issue where I want to run a notebook (kedro ipython) with the same
session_id
as the rest of the pipeline, I was able to • extract
session_id
using
hooks
• pass
session_id
as a parameter to notebook using
papermill
• and then creating a Kedro Session with a specific
session_id
with the code bellow:
Copy code
# Creating Kedro Session, Context and Catalog
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
from pathlib import Path
import logging, sys

bootstrap_project(Path(".."))
session = KedroSession(session_id=session_id)
context = session.load_context()
catalog = context._get_catalog(save_version=session_id)
I wanted to check if there's any risk on forcing a
session_id
, anything we should watch out for?
m
@Giovanna Cavali what you're doing here is outside of any recommended use of Kedro or in fact outside of any publicly exposed API. The session should be created through
KedroSession.create()
which doesn't take the
session_id
argument. There's an open issue exactly for allowing this: https://github.com/kedro-org/kedro/issues/1731 There's no clear view on all the consequences of doing this, but the most important part is that the
save_version
and session_id are connected.
context._get_catalog(save_version=session_id)
here you're also using private API. Long story short: you can do all of the above, none of it is recommended and you have to take responsibility to make sure you're aware of the consequences of using private APIs (never do this in a production system if you can avoid it) 😅