Hi team is there a command to see the list of unrun kedro no Kedro #questions

Hi team, is there a command to see the list of unr...

Dotun O

03/27/2023, 4:58 PM

Hi team, is there a command to see the list of unrun kedro nodes (after a pipeline code fails) ? For now I get a run with --from-nodes "" as part of my error statement but look like to access the actual list. Thanks cc @datajoely

datajoely

03/27/2023, 4:59 PM

You could do this dynamically with an

on_node_error

hook

datajoely

03/27/2023, 5:00 PM

but there is no out of the box command IIRC

Dotun O

03/27/2023, 5:02 PM

hmm ok. How do I access the non run nodes within on_node_error? I could add this as an easy functionality for the client if time permits

Dotun O

04/10/2023, 2:42 PM

Hey @datajoely, it looks like I have to implement this functionality. For the

on_node_error

function hook, do we have a way to track the remaining unrun nodes? I only see the next node to run in the function call. Second question, if I implement this functionality (and test it accordingly) and want it to be part of the general kedro library, what steps do I need to take to have it implemented?

datajoely

04/10/2023, 2:47 PM

ah so you would need to also define the

before_pipeline_run

hook and maintain the run

pipeline

object in

self

datajoely

04/10/2023, 2:47 PM

you can work out the downstream dependents

datajoely

04/10/2023, 2:47 PM

it’s not super clean

datajoely

04/10/2023, 2:47 PM

but it would work

Dotun O

04/10/2023, 2:54 PM

ok, thanks that makes sense. Will the pipeline object be deleted after the run breaks? Also, what will I be using to keep track of the (run nodes, failed node, and unrun nodes)? In the screenshot below, I see the from_nodes, to_nodes, and node_names.

datajoely

04/10/2023, 2:54 PM

yeah so that pipeline object is ephemeral

datajoely

04/10/2023, 2:54 PM

so you would have to use both hooks

datajoely

04/10/2023, 2:55 PM

I’m going to write psuedocode but you should get jist

Copy code

class FailureHooks:

    def __init__(self):       
        self.run_pipeline = None

👍 1

datajoely

04/10/2023, 2:56 PM

sorry slack editor not being nice, gimme a sec

datajoely

04/10/2023, 2:58 PM

Copy code

class FailureHooks:

    def __init__(self):       
        self.run_pipeline = None

    def before_pipeline_run(self, pipeline):
        self.run_pipeline = pipeline

    def on_node_error(self, node):
        remaining_pipeline = self.run_pipeline.from_nodes(node.name)
        # do stuff with this

datajoely

04/10/2023, 2:58 PM

essentially pick up the pipeline object in one hook

datajoely

04/10/2023, 2:58 PM

and then work with it on node errror

datajoely

04/10/2023, 2:58 PM

I think the above works but have written it without testing

Dotun O

04/10/2023, 3:02 PM

ooh thanks for sharing, this makes sense. Two questions, 1. The current project has a ProjectHooks class, will the FailureHooks be a different class? 2. Will the on_node_error be called when --from_nodes is called? Also will there be a way to see the left over unrun nodes?

datajoely

04/10/2023, 3:03 PM

1. Simply add another class in

settings.py

2. Yes it will be called 2b You need to log out the contents of

remaining_pipeline

some way that the user can work with

👍 1

Dotun O

04/10/2023, 3:04 PM

Ok great.This is super helpful

Dotun O

04/10/2023, 3:08 PM

one last question, there are times whereby I get an error and the error output shows

from- nodes ""

, and when I re-run with the command, the entire pipeline runs. What situations lead to from-nodes being an empty string? How do I avoid this situation by adding this run from unrun node functionality?

datajoely

04/10/2023, 3:08 PM

so I’m not sure - it’s wort looking at the implementation on how that’s generated

Dotun O

04/10/2023, 3:09 PM

ok, please where can I find it?

datajoely

04/10/2023, 3:09 PM

but IIRC because you’re doing dynamic pipeline building I think you may bypass some of our checks

datajoely

04/10/2023, 3:09 PM

also remember (1) nodes may need to be named (2) the inputs need to be persisted

datajoely

04/10/2023, 3:09 PM

as you can resume a emphemeral memory input

datajoely

04/10/2023, 3:10 PM

https://github.com/kedro-org/kedro/blob/be247baa768308aad4b9feac1d4d0fd0164caf78/kedro/runner/runner.py#L121

Dotun O

04/10/2023, 3:15 PM

oh this is great. it looks like the

run_only_missing

functionality you shared, might be what I need. Is there a way for me to access this function easily within my pipeline runs. Can I do something like

pipeline.run_only_missing()

Dotun O

04/11/2023, 1:06 PM

Hey @datajoely, Thank you for your help yesterday. I was able to implement the re-run functionality, based on the approach you described and it looks like it is working. I have an edge case though, when I am running the pipeline for multiple groups, there are situations whereby the node might run for some specific groups and fail for others. Right now my implementation re-runs the failed node for all the group (including those that might have succeeded). Is there a way to avoid that ? In the kedro runner implementation, there is a local done nodes variable that is stored. Is there a way for me to access it. Here is the link: https://github.com/kedro-org/kedro/blob/fa8c56fa2e510e6a449f5ac7356f76c167be978a/kedro/runner/sequential_runner.py#L71

datajoely

04/11/2023, 1:27 PM

so you could subclass the native runner

datajoely

04/11/2023, 1:27 PM

and add publicly accessible attributes?

Dotun O

04/11/2023, 1:35 PM

can I subclass the native runner with the FailiureHooks class I created already? Or will this be a different class ? Also I know the suggest resume scenario (private function) contains the done_nodes variable, can I extend that functionality in the FailureHooks to save the done_nodes?

self._suggest_resume_scenario(pipeline, done_nodes, catalog)

datajoely

04/11/2023, 2:27 PM

I think that would work

👍 1

Dotun O

04/12/2023, 4:11 PM

Thanks @datajoely, I got the steps to work for me

12 Views

Open in Slack

Previous Next