Pedro Sousa Silva
11/14/2024, 11:09 PMcolumn a shouldn't have any nulls
, column b should never be lower than 10
))?
• I feel like it would be impossible to create dummy data to account for all edge cases in the test function itself
• Reading from the production input table, on the other hand, defeats the purpose of unit testing.
• Does it make sense to generate synthetic or sample data from the input tables to the node and store it somewhere to be read at testing time?Hall
11/14/2024, 11:09 PMNok Lam Chan
11/15/2024, 12:06 AMNok Lam Chan
11/15/2024, 12:09 AMhypothesis
that take this to the next level, though I am not sure if that's what you are looking for.
How does the complexity of node affect this test? I guess what you want to do is testing the input/output, but not necessary every intermediate output.Yury Fedotov
11/15/2024, 2:46 AMpandera
. That library allows you to define all conditions you want to enforce for your dataset, and validate any dataset, in your example I guess the output of that function, against them. Pandera can also generate synthetic data automatically, based on schema definition.
This is literally what it’s designed for:
> column a shouldn't have any nulls, column b should never be lower than 10))Pedro Sousa Silva
11/15/2024, 10:42 AMPedro Sousa Silva
11/15/2024, 10:43 AMPedro Sousa Silva
11/22/2024, 3:58 PMJuan Luis
11/22/2024, 4:51 PMthis sounds like property-based testing to me! https://hypothesis.readthedocs.io/en/latest/quickstart.html,column a shouldn't have any nulls
))?column b should never be lower than 10
Juan Luis
11/22/2024, 4:52 PMJuan Luis
11/22/2024, 4:53 PM• Does it make sense to generate synthetic or sample data from the input tables to the node and store it somewhere to be read at testing time?
I've done this with a number of projects, yes!
Pedro Sousa Silva
12/11/2024, 2:34 PMNishant Rathi
12/11/2024, 2:39 PM