Hi team, is there a command to see the list of unr...
# questions
Hi team, is there a command to see the list of unrun kedro nodes (after a pipeline code fails) ? For now I get a run with --from-nodes "" as part of my error statement but look like to access the actual list. Thanks cc @datajoely
You could do this dynamically with an
but there is no out of the box command IIRC
hmm ok. How do I access the non run nodes within on_node_error? I could add this as an easy functionality for the client if time permits
Hey @datajoely, it looks like I have to implement this functionality. For the
function hook, do we have a way to track the remaining unrun nodes? I only see the next node to run in the function call. Second question, if I implement this functionality (and test it accordingly) and want it to be part of the general kedro library, what steps do I need to take to have it implemented?
ah so you would need to also define the
hook and maintain the run
object in
you can work out the downstream dependents
it’s not super clean
but it would work
ok, thanks that makes sense. Will the pipeline object be deleted after the run breaks? Also, what will I be using to keep track of the (run nodes, failed node, and unrun nodes)? In the screenshot below, I see the from_nodes, to_nodes, and node_names.
yeah so that pipeline object is ephemeral
so you would have to use both hooks
I’m going to write psuedocode but you should get jist
Copy code
class FailureHooks:

    def __init__(self):       
        self.run_pipeline = None
👍 1
sorry slack editor not being nice, gimme a sec
Copy code
class FailureHooks:

    def __init__(self):       
        self.run_pipeline = None

    def before_pipeline_run(self, pipeline):
        self.run_pipeline = pipeline

    def on_node_error(self, node):
        remaining_pipeline = self.run_pipeline.from_nodes(node.name)
        # do stuff with this
essentially pick up the pipeline object in one hook
and then work with it on node errror
I think the above works but have written it without testing
ooh thanks for sharing, this makes sense. Two questions, 1. The current project has a ProjectHooks class, will the FailureHooks be a different class? 2. Will the on_node_error be called when --from_nodes is called? Also will there be a way to see the left over unrun nodes?
1. Simply add another class in
2. Yes it will be called 2b You need to log out the contents of
some way that the user can work with
👍 1
Ok great.This is super helpful
one last question, there are times whereby I get an error and the error output shows
from- nodes ""
, and when I re-run with the command, the entire pipeline runs. What situations lead to from-nodes being an empty string? How do I avoid this situation by adding this run from unrun node functionality?
so I’m not sure - it’s wort looking at the implementation on how that’s generated
ok, please where can I find it?
but IIRC because you’re doing dynamic pipeline building I think you may bypass some of our checks
also remember (1) nodes may need to be named (2) the inputs need to be persisted
as you can resume a emphemeral memory input
oh this is great. it looks like the
functionality you shared, might be what I need. Is there a way for me to access this function easily within my pipeline runs. Can I do something like
Hey @datajoely, Thank you for your help yesterday. I was able to implement the re-run functionality, based on the approach you described and it looks like it is working. I have an edge case though, when I am running the pipeline for multiple groups, there are situations whereby the node might run for some specific groups and fail for others. Right now my implementation re-runs the failed node for all the group (including those that might have succeeded). Is there a way to avoid that ? In the kedro runner implementation, there is a local done nodes variable that is stored. Is there a way for me to access it. Here is the link: https://github.com/kedro-org/kedro/blob/fa8c56fa2e510e6a449f5ac7356f76c167be978a/kedro/runner/sequential_runner.py#L71
so you could subclass the native runner
and add publicly accessible attributes?
can I subclass the native runner with the FailiureHooks class I created already? Or will this be a different class ? Also I know the suggest resume scenario (private function) contains the done_nodes variable, can I extend that functionality in the FailureHooks to save the done_nodes?
self._suggest_resume_scenario(pipeline, done_nodes, catalog)
I think that would work
👍 1
Thanks @datajoely, I got the steps to work for me