Zubin Roy
08/15/2024, 10:38 AM0.19.1
onwards the errors you get back in your terminal are misleading. For example see the below error I am currently getting.
ValueError: Pipeline does not contain nodes named ['download_and_validate_files_node'].
This always makes me think I've done something wrong with my kedro imports but the reality is the actual error is this
raise SchemaInitError(
pandera.errors.SchemaInitError:
custom check 'checks' is not
available. Make sure you use
pandera.extensions.register_check_m
ethod decorator to register your
custom check method.
But you have to scroll all the way to the top of your terminal and can sometimes not be straightforward to figure out what the actual error was. (And that's only because I know this is what happens in the current version of Kedro but I reckon if you were new to Kedro this would be really confusing to figure out). Where in previous Kedro versions the actual error would have been the error at the bottom of the terminal. I think it's something to do with the find_pipelines()
function. But wanted to know if this was a known issue? And if there was a way to resolve it. Thanks!Nok Lam Chan
08/15/2024, 3:08 PMprevious Kedro versionWhich one were you on?
Nok Lam Chan
08/15/2024, 3:09 PMdeffind_pipelines(raise_errors: bool = False) It has an option to explicit raise error. The reason behind this is that usually project has lots of pipeline and you don't want your pipeline to error out because of irrelevant pipeline.
Zubin Roy
08/15/2024, 4:48 PMversion==0.19.3
. But I don't think I explained myself very well. So if I try to run the below function it will fail because I have not imported pandas as pd right? But the immediate error returned to my terminal is ValueError: Pipeline does not contain nodes named ['generate_elos_atp_node'].
which is misleading because that is not the reason the node failed? The actual error is above all the red errors in white but previously that error used to be the last thing returned to you in the terminal in previous versions of kedro and was wondering why that has changed?
Does that make more sense?
def generate_elos(
match_df: pd.DataFrame,
params: dict,
):
"""
Args:
match_df:
params:
Returns:
"""
return None
Nok Lam Chan
08/15/2024, 5:11 PMfind_pipelines(raise_errors=True)
(you may need 0.19.6 or 0.19.7)Nok Lam Chan
08/15/2024, 5:13 PM. The reason behind this is that usually project has lots of pipeline and you don't want your pipeline to error out because of irrelevant pipeline.This is exactly why this happens. In this case you are using
pandas
so it's obvious. Imagine a data engineer working with Spark pipeline but getting dependencies issue because the data scientist are training model with Pytorch/tensorflow etc, these are not needed for the DE.
The design is that this would skip those pipelines directly so you can continue to run whatever pipeline is ready.Nok Lam Chan
08/15/2024, 5:14 PMraise_errors=True
is useful in that context.Nok Lam Chan
08/15/2024, 5:15 PMImportError
, thus the pipeline is being skipped. I agree this could be confusing sometimes.Zubin Roy
08/15/2024, 5:27 PM