Hey all. I think from about kedro `0.19.1` onwards...
# questions
z
Hey all. I think from about kedro
0.19.1
onwards the errors you get back in your terminal are misleading. For example see the below error I am currently getting.
ValueError: Pipeline does not contain nodes named ['download_and_validate_files_node'].
This always makes me think I've done something wrong with my kedro imports but the reality is the actual error is this
raise SchemaInitError(
pandera.errors.SchemaInitError:
custom check 'checks' is not
available. Make sure you use
pandera.extensions.register_check_m
ethod decorator to register your
custom check method.
But you have to scroll all the way to the top of your terminal and can sometimes not be straightforward to figure out what the actual error was. (And that's only because I know this is what happens in the current version of Kedro but I reckon if you were new to Kedro this would be really confusing to figure out). Where in previous Kedro versions the actual error would have been the error at the bottom of the terminal. I think it's something to do with the
find_pipelines()
function. But wanted to know if this was a known issue? And if there was a way to resolve it. Thanks!
n
previous Kedro version
Which one were you on?
def
find_pipelines(raise_errors: bool = False) It has an option to explicit raise error. The reason behind this is that usually project has lots of pipeline and you don't want your pipeline to error out because of irrelevant pipeline.
z
Hi @Nok Lam Chan thanks for replying. So I am using kedro
version==0.19.3
. But I don't think I explained myself very well. So if I try to run the below function it will fail because I have not imported pandas as pd right? But the immediate error returned to my terminal is
ValueError: Pipeline does not contain nodes named ['generate_elos_atp_node'].
which is misleading because that is not the reason the node failed? The actual error is above all the red errors in white but previously that error used to be the last thing returned to you in the terminal in previous versions of kedro and was wondering why that has changed? Does that make more sense?
Copy code
def generate_elos(
    match_df: pd.DataFrame,
    params: dict,
):
    """

    Args:
        match_df:
        params:

    Returns:

    """

    return None
n
I know I am repeating myself, but have you try
find_pipelines(raise_errors=True)
(you may need 0.19.6 or 0.19.7)
. The reason behind this is that usually project has lots of pipeline and you don't want your pipeline to error out because of irrelevant pipeline.
This is exactly why this happens. In this case you are using
pandas
so it's obvious. Imagine a data engineer working with Spark pipeline but getting dependencies issue because the data scientist are training model with Pytorch/tensorflow etc, these are not needed for the DE. The design is that this would skip those pipelines directly so you can continue to run whatever pipeline is ready.
In production you certainly want it to fail eagerly, so
raise_errors=True
is useful in that context.
The error you get, is a result of error
ImportError
, thus the pipeline is being skipped. I agree this could be confusing sometimes.
z
Understood. That makes more sense and I understand the context for why you would want to skip pipelines etc. Thanks!