https://kedro.org/ logo
#questions
Title
# questions
i

Iñigo Hidalgo

06/08/2023, 5:01 PM
Has something been done around type checking in kedro pipelines? Could be an interesting option for ensuring data correctness
n

Nok Lam Chan

06/08/2023, 5:11 PM
What sort of type checking are you thinking about?
i

Iñigo Hidalgo

06/08/2023, 5:23 PM
similar to regular type checking, for example a basic implementation could be if one node has a return type of pd.dataframe but the node which consumes from it has an input type of pd.series, raise an error or something along those lines. for more thorough checks could also look at checking catalog dataset types vs nodes' expectations
what made me think of this was that I wrote a node which expected a string kwarg but in my config I wrote it as
Copy code
config_key:
  - config_value
and it took me a little while to track the issue down
n

Nok Lam Chan

06/08/2023, 5:28 PM
You may want to have some type checker, do you already employed things like
mypy
although it is a static type checker. I am not familiar with runtime type checker in Python space.
i

Iñigo Hidalgo

06/08/2023, 5:31 PM
we do partially use mypy in its less strict mode, although mosto of our functions aren't properly typed. the problem with existing static type checkers is that they cannot traverse kedro node dependencies so they can't ensure type correctness between nodes.
i think any implementation would be quite complex and incomplete anyways
n

Nok Lam Chan

06/08/2023, 9:51 PM
I see what you mean. I wonder is it just a simple type checking for your case though. You will have caught it as the input expect string but a list is provided.
Some more thought, you also don’t really need the dependencies graph because you only need to check the function itself.
f

Flavien

06/09/2023, 7:19 PM
@Iñigo Hidalgo Maybe you could use a mix of pandera – for dataframe like objects – and pydantic – for parameter like – to add validation on the input of your nodes.
6 Views