I have a question regarding `kedro-pandera` <@U03R...
# plugins-integrations
a
I have a question regarding
kedro-pandera
@Nok Lam Chan - why the hook validates datasets before node run instead of after/before dataset is loaded/saved? It leads to the same data being re-validated multiple times when dataset is shared among nodes
đź‘€ 2
m
I think it’s mainly because they have a catalog variable when using before/after node run. Btw even if you would switch to before/after dataset saved/loaded, you would still do validations multiple times. The only way to avoid that is by adding additional logic to the hook to only validate on load when its a “free” input
pipeline.inputs()
in kedro’s language. Would be a nice addition though…
n
I am slightly outdated with the
kedro-pandera
development lately, so pinging @Yolan Honoré-Rougé here. If there is no response this week I will come back to this next week. I am a bit overloaded at the moment with `vscode-kedro`and various things
a
Yes you're right, I've tested the solution with after dataset loaded and sadly this hook behaves differently than I expected, as it's "loaded" every time it is passed to nodes
And in the newest version I see added exception to avoid revalidation multiple times by tracking set of validated datasets which is something I wanted to add
just I had a thought that validation should occur on loading only once, and on before-saved every time