Kedro is an open-sourced Python framework for creating maintainable and modular data science code.

Kedro

Hello!
I’m currently working with Kedro and Pandera for PySpark. I’m looking for some guidance on how to validate schemas and would appreciate any best practices or references you can provide.
I have a few specific questions:
1. Where is the recommended place to define the schema? Should I build it using hooks from parameters (yaml)? Create a `schema.py` file? Or define the schema directly in `nodes.py`?
I would greatly appreciate any help or suggestions you can offer. Thank you!

I don't have too many experience with pandera yet, what would be the schema look like? Building it from hook sounds reasonable to me.

On the other hand I know there are some development for kedro-pandera plugin  and some team recently help adding native pyspark support to pandera.

<https://github.com/unionai-oss/pandera/issues/1138|https://github.com/unionai-oss/pandera/issues/1138>

keep an eye on <https://github.com/Galileo-Galilei/kedro-pandera> :slightly_smiling_face: