Hey all, has anyone played around with kedro-pande...
# questions
j
Hey all, has anyone played around with kedro-pandera and partitioned datasets? It doesn't seem to be supported yet and I was wondering if it actually exists and I'm not finding it / anyone found a smart workaround. Any other recommendations for data validation in kedro pipelines is appreciated! Thanks!
v
Hey Javier I am also looking for some solutions around data validation . No success yet. For now I thought of simply using Pandera as an external library. Following the thread .
👍 1
➕ 1
d
@Nok Lam Chan and @Yolan Honoré-Rougé are the most knowledgeable on this, so will let them answer. From some other threads/having looked at it on the surface, I think there are gaps like this in Kedro-Pandera where contributions would be very welcome if you wanted to make a PR.
Any other recommendations for data validation in kedro pipelines is appreciated!
There's nothing more mature to my understanding; there have been Great Expectations integrations in the past, but nothing that I'm aware of that's public and maintained.
n
I haven't tried this myself, it may not work out of the box but the idea is pretty much the same but iterate the validation on the partitions. It shouldn't be a big change so if there are PRs coming in I can review.
y
Hello, can't say much more than Nok and Deepyaman. It should be pretty straightforward to modify the hook to loop over the partitions, but the plugins development are frozen on our side. I think it will resume one day, but can't give a timeline. PRs are welcomed though. Some research lead me to https://gitlab.com/anacision/kedro-expectations which seems to be the most up to date validation plugin for now (release rate of 1 every 2 months given the history, and documentation seems up to date). I have not tried it myself so I can't really assess how good it is, but maybe it's worth giving it a try
👀 1
👍 1
None of the developers seem to be on this slack unfortunately