https://kedro.org/ logo
#questions
Title
# questions
e

Erwin

07/11/2023, 7:03 PM
Hello! I’m currently working with Kedro and Pandera for PySpark. I’m looking for some guidance on how to validate schemas and would appreciate any best practices or references you can provide. I have a few specific questions: 1. Where is the recommended place to define the schema? Should I build it using hooks from parameters (yaml)? Create a
schema.py
file? Or define the schema directly in
nodes.py
? I would greatly appreciate any help or suggestions you can offer. Thank you!
n

Nok Lam Chan

07/11/2023, 7:26 PM
I don't have too many experience with pandera yet, what would be the schema look like? Building it from hook sounds reasonable to me.
On the other hand I know there are some development for kedro-pandera plugin and some team recently help adding native pyspark support to pandera. https://github.com/unionai-oss/pandera/issues/1138
👍 1
j

Juan Luis

07/12/2023, 9:38 AM
e

Erwin

07/12/2023, 4:15 PM
Thanks!