Great Expectations 1.0! <https://github.com/great-...
# resources
j
d
How do you have a 1.0 release (after all this time) while saying Python 3.12 support is experimental and a lot of the APIs are still not clear. Haven't looked into it recently, but was hoping to see a better indicator of stability, given my past attempt to integrate was thwarted by there being new and old ways of doing the same things that seemed to be continuously in flux.
m
That’s one of the reasons I switched to pandera. That and the fact that GX doesn’t have polars support and it adds a ton of libraries I don’t need (Jupyter ecosystem) in production.
👀 1
p
piggi-backing on this, what's peoples favorite ways to define data quality / structure checks?
googling pandera
y
@Pascal Brokmeier my thinking on this • If my goal is to validate something about some dataframes in runtime, and I want the pipeline to fail on error, I use
pandera
• If my goal is to get a dataset and statically check it, i.e. the goal is to understand data quality rather than fail some process on error, I think
ge
is the way to go
👍 1
p
sounds like a good rule, we're definitely after pandera since we keep adding more data sources to our system and we want to make sure we fail if they don't meet our expectations. Or we upgrade upstream data source versions and they sometimes mess with their data modelling approach which introduces noise which we'd like to catch
👍 1
m
That’s exactly how we use it too. Combined with kedro hooks, it’s actually quite powerful