Jannik Wiedenhaupt
04/05/2023, 8:54 PMDeepyaman Datta
04/05/2023, 10:08 PMIs it even necessary to write tests for that?Testing is always necessary! Nodes are just Python functions, so it makes sense to test them as such. For example, if you have a
drop_sparse_columns(data: pd.DataFrame, threshold: float)
node that removes any columns with more than threshold
fraction null values, you could start off by writing a test that:
1. Takes a basic dataframe input (e.g. test_data = pd.DataFrame({"a": [1, 2, None], "b": [1, None, None]})
)
2. Call got = drop_sparse_columns(data, 0.5)
3. Know what to expect: expected = data.drop(columns="b")
4. Make sure it worked properly: assert got == expected
Of course, as you hit more edge cases/complicated logic, you add more tests
Beyond this, it's also possible to add a level of integration testing for your pipelines, where you run the pipeline on sample data and check that you get the expected behaviors (but this is usually less granular than unit tests, and often misses nastier logic bugs as a result).Nok Lam Chan
04/06/2023, 9:30 AM