Hey all I think from about kedro `0 19 1` onwards the errors Kedro #questions

Hey all. I think from about kedro `0.19.1` onwards...

Zubin Roy

08/15/2024, 10:38 AM

Hey all. I think from about kedro

0.19.1

onwards the errors you get back in your terminal are misleading. For example see the below error I am currently getting.

ValueError: Pipeline does not contain nodes named ['download_and_validate_files_node'].

This always makes me think I've done something wrong with my kedro imports but the reality is the actual error is this

raise SchemaInitError(

pandera.errors.SchemaInitError:

custom check 'checks' is not

available. Make sure you use

pandera.extensions.register_check_m

ethod decorator to register your

custom check method.

But you have to scroll all the way to the top of your terminal and can sometimes not be straightforward to figure out what the actual error was. (And that's only because I know this is what happens in the current version of Kedro but I reckon if you were new to Kedro this would be really confusing to figure out). Where in previous Kedro versions the actual error would have been the error at the bottom of the terminal. I think it's something to do with the

find_pipelines()

function. But wanted to know if this was a known issue? And if there was a way to resolve it. Thanks!

Nok Lam Chan

08/15/2024, 3:08 PM

previous Kedro version

Which one were you on?

Nok Lam Chan

08/15/2024, 3:09 PM

def
find_pipelines(raise_errors: bool = False) It has an option to explicit raise error. The reason behind this is that usually project has lots of pipeline and you don't want your pipeline to error out because of irrelevant pipeline.

Zubin Roy

08/15/2024, 4:48 PM

Hi @Nok Lam Chan thanks for replying. So I am using kedro

version==0.19.3

. But I don't think I explained myself very well. So if I try to run the below function it will fail because I have not imported pandas as pd right? But the immediate error returned to my terminal is

ValueError: Pipeline does not contain nodes named ['generate_elos_atp_node'].

which is misleading because that is not the reason the node failed? The actual error is above all the red errors in white but previously that error used to be the last thing returned to you in the terminal in previous versions of kedro and was wondering why that has changed? Does that make more sense?

Copy code

def generate_elos(
    match_df: pd.DataFrame,
    params: dict,
):
    """

    Args:
        match_df:
        params:

    Returns:

    """

    return None

Nok Lam Chan

08/15/2024, 5:11 PM

I know I am repeating myself, but have you try

find_pipelines(raise_errors=True)

(you may need 0.19.6 or 0.19.7)

Nok Lam Chan

08/15/2024, 5:13 PM

. The reason behind this is that usually project has lots of pipeline and you don't want your pipeline to error out because of irrelevant pipeline.

This is exactly why this happens. In this case you are using

pandas

so it's obvious. Imagine a data engineer working with Spark pipeline but getting dependencies issue because the data scientist are training model with Pytorch/tensorflow etc, these are not needed for the DE. The design is that this would skip those pipelines directly so you can continue to run whatever pipeline is ready.

Nok Lam Chan

08/15/2024, 5:14 PM

In production you certainly want it to fail eagerly, so

raise_errors=True

is useful in that context.

Nok Lam Chan

08/15/2024, 5:15 PM

The error you get, is a result of error

ImportError

, thus the pipeline is being skipped. I agree this could be confusing sometimes.

Zubin Roy

08/15/2024, 5:27 PM

Understood. That makes more sense and I understand the context for why you would want to skip pipelines etc. Thanks!

2 Views

Open in Slack

Previous Next