Hello! I’m currently working with Kedro and Pander...
# questions
e
Hello! I’m currently working with Kedro and Pandera for PySpark. I’m looking for some guidance on how to validate schemas and would appreciate any best practices or references you can provide. I have a few specific questions: 1. Where is the recommended place to define the schema? Should I build it using hooks from parameters (yaml)? Create a
schema.py
file? Or define the schema directly in
nodes.py
? I would greatly appreciate any help or suggestions you can offer. Thank you!
n
I don't have too many experience with pandera yet, what would be the schema look like? Building it from hook sounds reasonable to me.
On the other hand I know there are some development for kedro-pandera plugin and some team recently help adding native pyspark support to pandera. https://github.com/unionai-oss/pandera/issues/1138
👍 1
j
e
Thanks!