Hi all! The Kedro documentation has a nice example of how to validate data with great expectations. But it only looks at one dataset at a time. But what would I do if I need to validate the data of a node that merges two datasets? Let's say one table is a lookup table and the other table may only contain entries that exists in the lookup table? Has anyone every checked multiple datasets at a time? Do you have an example for that?
n
Nok Lam Chan
10/07/2024, 7:40 PM
Can you validate the output after merging?
d
datajoely
10/08/2024, 6:51 AM
Yeah you're touching on the difference between testing on persisted versus in memory data
datajoely
10/08/2024, 6:53 AM
I saw this the other day
datajoely
10/08/2024, 6:53 AM
GE falls towards the end of the spectrum
datajoely
10/08/2024, 6:54 AM
Something like testing the cardinality of a join could be something you validate on persisted outputs, or it could be something that you "shift left" and test at execution time with something like Pandera