Hello! Do you have any tips for debugging nodes an...
# questions
a
Hello! Do you have any tips for debugging nodes and functions in Kedro? Here's what I'm trying to do: I want to make incremental updates to some functions as defined in
nodes.py
and then verify their output. Typically, these
functions
rely on data specified in the
catalog.yml
as parameters. I'm currently using Kedro's IPython environment, which allows me to load data using
catalog.load('datasetname')
. However, I find it a bit confusing to figure out how to run the functions I've defined for a specific pipeline. I use
%reload_kedro
to refresh my Kedro IPython environment. I'm aware that it's possible to run nodes (slicing them), but I'm wondering if it's also possible to run only a specific function before fully defining the nodes. I'd greatly appreciate any insights or best practices for carrying out these incremental updates effectively.
n
if it’s also possible to run only a specific function before fully defining the nodes.
What do you mean?
h
Well if you want to test your nodes before defining them in the pipeline then I don't understand what's the problem, you can test running a script or better, test it in the kedro notebook (that enables you to load the data, run your function and check the output without saving it). Once you're sure the outputs are as you like it you can then connect it to your pipeline. I personally prefer to connect first my nodes to my pipeline using class
MemoryDataset
as outputs and test-run them from the Kedro Notebook and inspect the results. I usually use it that way:
res = session.run(nodes=["node_im_testing"])
If the outputs are not as I want or if my node fails you can then debug the cell that contains the previous line with breakpoints inside your node. When I'm sure my node works I then connect the outputs to the catalog or to the next nodes, clear my notebook and I'm done 😉. Hope it helps!
a
@Nok Lam Chan for example, I'm creating a new function process_three_data(dataA,dataB,dataC) in nodes.py under the data_processing folder I just want to run this specific function and update this function incrementally. Typically in a one script python file, I can just
Shift+Enter
the lines of code that I want and it will goes to the Python terminal. But when I'm within kedro iPython environment, I'm not sure how to do this kind of incremental changes to my codes and test them.
@Hugo Rebeix right, I think I'll stick to the
kedro jupyter notebook
at the moment. But yeah, ideally, I want to update my functions within the nodes.py and use something like
Shift+Enter
to see the output of the function.
👍 1
h
Yeah the bad thing with the session.run() method is that you have to restart the kernel between each run and or change to the node (or maybe the
%reload_kedro
enables to reload everything without restarting the kernel)
n
%reload_kedro will do.
🚀 1
I am not sure if this is a Kedro specific issue, seems that you are having problem with the IPython terminal?
Typically in a one script python file, I can just
Shift+Enter
the lines of code that I want and it will goes to the Python terminal.
If I understand correctly, it won’t work if it’s a function? If you want to achieve the same in any interactive terminal/Jupyter. You have two choices: 1. Use a debugger (it’s built exactly for this purpose and have more advance feature) 2. Copy &paste the function and remove the indent etc.
a
@Nok Lam Chan yes, I think I'll use the debugger or the jupyter notebook option at the moment. Thanks for the tips!
n
I proposed to create a jupyter magic which does all these copy &pasting job and stitch them nicely in a Jupyter Notebook which allows similar workflow that you described. https://github.com/kedro-org/kedro/issues/1832
👍 1
If you think this would be helpful please upvote and leave some comments there to help us prioritise.
h
@Afiq Johari in Vscode you can use the debugger for a single cell of a notebook 😉
👍 1
🥳 1