Inger van Boeijen
09/11/2023, 10:01 PMDeepyaman Datta
09/12/2023, 4:06 AMpd.DataFrame
if the function expects that?Inger van Boeijen
09/12/2023, 7:21 AMIñigo Hidalgo
09/12/2023, 8:23 AMdatajoely
09/12/2023, 8:45 AMpa.check_types
decortator
• This technically doesn’t really rely on Kedro for anything but calling the function and therefore it’s somewhat independent of the catalog
• The pattern I like most:
nodes
|_ sales_nodes.py <- Where you declare your nodes
|_ schemas <- where you store your schema classes to import
|_ customer_schemas.py
|_ product_schemas.py
Deepyaman Datta
09/12/2023, 2:24 PMIñigo Hidalgo
09/12/2023, 2:26 PMmypy
-type validation instead of pandera
Deepyaman Datta
09/12/2023, 2:36 PMbefore_node_run
hook, you could get the node._func
, use inspect
to get the argument types, and make sure they match. In doing a quick search, my suggestion is pretty in line with an old answer: https://stackoverflow.com/a/19684962/1093967 There may be a more modern way, but probably also not wholly necessary.
(You can also probably do it before_pipeline_run
, and iterate through all of the nodes and check parameters.)
Both of the above options should be fairly simple to implement.
If you want to validate the types without running the code, like mypy
, I'm sure you'd need to build some sort of extension/this would go beyond hooks. Basically, some sort of static analysis checker that can understand how nodes are called as functions, and also do the parsing from YAML.Inger van Boeijen
09/12/2023, 2:40 PMDeepyaman Datta
09/12/2023, 2:43 PMInger van Boeijen
09/15/2023, 9:20 AM