How do you write unit tests for kedro nodes? I am ...
# questions
j
How do you write unit tests for kedro nodes? I am doing data transformations in my pipeline and would like to make sure that they are done correctly. Is it even necessary to write tests for that?
d
Is it even necessary to write tests for that?
Testing is always necessary! Nodes are just Python functions, so it makes sense to test them as such. For example, if you have a
drop_sparse_columns(data: pd.DataFrame, threshold: float)
node that removes any columns with more than
threshold
fraction null values, you could start off by writing a test that: 1. Takes a basic dataframe input (e.g.
test_data = pd.DataFrame({"a": [1, 2, None], "b": [1, None, None]})
) 2. Call
got = drop_sparse_columns(data, 0.5)
3. Know what to expect:
expected = data.drop(columns="b")
4. Make sure it worked properly:
assert got == expected
Of course, as you hit more edge cases/complicated logic, you add more tests Beyond this, it's also possible to add a level of integration testing for your pipelines, where you run the pipeline on sample data and check that you get the expected behaviors (but this is usually less granular than unit tests, and often misses nastier logic bugs as a result).
👍 2
1000 1
👍🏼 1