I am cooking up a feature that helps with debugging Kedro No Kedro #user-research

I am cooking up a feature that helps with debuggin...

Nok Lam Chan

01/16/2024, 4:35 PM

I am cooking up a feature that helps with debugging Kedro Node. Would something like this be useful to you? What more would like to see? The function will prepare the necessary inputs and import for you, so you only focus on debugging the function and less hustle with loading the datasets.

debug function.mov

🆒 14

👍 1

1000 10

sadcat 2

party wizard 3

👌 5

🎉 13

datajoely

01/16/2024, 4:39 PM

that is very cool

datajoely

01/16/2024, 4:40 PM

would like a

%list_nodes

companion

datajoely

01/16/2024, 4:40 PM

it’s also technically

%load_node_func

Iñigo Hidalgo

01/16/2024, 4:40 PM

wow

Nok Lam Chan

01/16/2024, 4:40 PM

What does it do?

datajoely

01/16/2024, 4:41 PM

just tell you the list of functions avaialble

👍🏼 1

marrrcin

01/17/2024, 9:36 AM

Look awesome @Nok Lam Chan. Would there be a path back from notebook into the project’s files (as it was with

kedro jupyter convert

thankyou 1

datajoely

01/17/2024, 9:48 AM

Maybe something like

%update_node

? 🤔

Nok Lam Chan

01/17/2024, 1:21 PM

For now, it's only one way from code -> notebook, as the notebook is mean to be short-live and mainly used for debugging. Enable the other way round means we encourage people to actually do the development work in notebook, it's also much harder to get everything right because it's quite risky to edit source file. It's also relatively easy to do this manually (just copy paste the whole cell back to a .py file). That said it's up for discussion, but I think it deserves its own github issue and separate discussion. • What if it's a packaged project (i.e. in

site_packages

?) • Permission problem (may not have access to touch the source code, i.e. Databricks Repo) • Should it works only when the node is already written in a source file? i.e. it has a specific file to live in, or we also need to enable developing a new node in notebook? • Does it only need to handle the node? Or it need to handles the corresponding

catalog.py

entry and

pipeline.py

datajoely

01/17/2024, 1:23 PM

I think it’s excellent in its current format - printing the file path where this was retrieved from is an MVP solution of sorts too

Zhee

01/17/2024, 4:58 PM

HI. very nice! is it something that could work in kedro ipython also direclty ?

Nok Lam Chan

01/17/2024, 5:01 PM

It won't work for IPython, how would you edit multiple cell in a IPython terminal? (not sure if it is possible). In addition, due to the backend support, it will only works for notebook >7.0 (or jupyter lab)

Nok Lam Chan

01/17/2024, 5:02 PM

Initial support will be Jupyter Notebook/Lab (not include VSCode Notebook/Databricks Notebook)

Zhee

01/17/2024, 5:44 PM

ok thanks . yes youre right for the multiple cells! Ipython is nice to explore with ephemeral code, i often use it to debug/understand without creating any extra file. but thats not the same use case here

Nok Lam Chan

01/17/2024, 9:24 PM

It’s still a new feature so we are very open for changes, if you can think of how it can also be useful in a ipython terminal I am happy to explore too!

👍 1

Ivan Danov

01/19/2024, 11:03 AM

I think this is awesome, I've been waiting for that for years and now it's becoming a reality thanks to @Nok Lam Chan! 🙇 An IPython version will also be awesome, maybe one that just loads the datasets for you and the imports

💛 1

🥳 1

Artur Dobrogowski

02/02/2024, 11:38 AM

what happens when you re-run the command again? does it add the relevant code again in new fields?

Nok Lam Chan

02/02/2024, 12:44 PM

It creates new cells. You are not supposed to run that cell twice because once your code is in the notebook, you will be debugging there until you are done and you copy the code back to the source. Are you thinking of something like "reload" the source code? Could you elaborate how you are going to use it?

Artur Dobrogowski

02/02/2024, 12:53 PM

Sometimes when the kernel hangs it's useful to kill it and then do run all cells to restart and resume w work and I wonder how it behaves then

👍🏼 1

Artur Dobrogowski

02/02/2024, 12:54 PM

I guess you could just comment out the magic mine after using it

Artur Dobrogowski

02/02/2024, 12:55 PM

Or make it edit itself if it's not meant to be run many times :p

Nok Lam Chan

02/02/2024, 3:11 PM

Good point - not sure if it's feasible and I am not sure if editing the cell that you are running feels too intrusive to users. It's not a bad idea.

Nok Lam Chan

02/02/2024, 4:52 PM

The first draft is merged now! If you are brave enough to try it before release, go for it. We need to get the docs in and there are still some limitations. Release date are not fixed yet but should be soon.

Copy code

git clone git@github.com:kedro-org/kedro.git
cd kedro
pip install .

# Go to a notebook
%load_ext kedro.ipython
# %reload_kedro <project_path> only if it fails to find the project
%load_node <node_name>

🎉 1

datajoely

02/02/2024, 4:52 PM

🔥

Nok Lam Chan

03/04/2024, 11:07 PM

I am sure most people have seen the announcement already, but just in case this is out last week: https://kedro-org.slack.com/archives/C03RKAQ0MGQ/p1709221377663529

🥳 2

Nok Lam Chan

04/03/2024, 6:29 PM

if you have opinion about this feature or find bugs, please open an issue or comment it in https://github.com/kedro-org/kedro/discussions/3754. It helps us to prioritise the feature that you need🙏🏼

5 Views

Open in Slack

Previous Next