I’m currently working with Kedro and Pandera for PySpark. I’m looking for some guidance on how to validate schemas and would appreciate any best practices or references you can provide.
I have a few specific questions:
1. Where is the recommended place to define the schema? Should I build it using hooks from parameters (yaml)? Create a
file? Or define the schema directly in
I would greatly appreciate any help or suggestions you can offer. Thank you!
Nok Lam Chan
07/11/2023, 7:26 PM
I don't have too many experience with pandera yet, what would be the schema look like? Building it from hook sounds reasonable to me.