Hi kedro! Just starting using Kedro and looks real...
# questions
i
Hi kedro! Just starting using Kedro and looks really nice so far! Just wondering is there a way to validate the node input types? Also if they are parameters? I do put type hints in the node functions. I was wondering if i can do it through a hook?
d
Welcome! What do you mean "validate"? Like, validate the data (using something like Pandera, Great Expectations), or validate that the configured dataset returns a
pd.DataFrame
if the function expects that?
i
The second one. For example, I have a paramater max_days that is used in a function def create_checkpoints(max_days: int). I want to make sure that the paramater that is passed is indeed an int and not a string
i
Watching any replies with interest 😄
d
• For runtime validation I think the best solution is to annotate the function that your node calls with
pa.check_types
decortator • This technically doesn’t really rely on Kedro for anything but calling the function and therefore it’s somewhat independent of the catalog • The pattern I like most:
Copy code
nodes
|_ sales_nodes.py <- Where you declare your nodes
|_ schemas <- where you store your schema classes to import
   |_ customer_schemas.py 
   |_ product_schemas.py
Longer term we’re working on a deeper, more kedro native integration https://kedro-org.slack.com/archives/C03RKAQ0MGQ/p1693825200938839
d
@datajoely I think this question is about object type validation, though (e.g. the type of a parameter, or whether something is pandas DataFrame vs. Spark DataFrame).
i
@Deepyaman Datta yeah, basically
mypy
-type validation instead of
pandera
d
@Inger van Boeijen You can validate this using hooks, at runtime. In a
before_node_run
hook, you could get the
node._func
, use
inspect
to get the argument types, and make sure they match. In doing a quick search, my suggestion is pretty in line with an old answer: https://stackoverflow.com/a/19684962/1093967 There may be a more modern way, but probably also not wholly necessary. (You can also probably do it
before_pipeline_run
, and iterate through all of the nodes and check parameters.) Both of the above options should be fairly simple to implement. If you want to validate the types without running the code, like
mypy
, I'm sure you'd need to build some sort of extension/this would go beyond hooks. Basically, some sort of static analysis checker that can understand how nodes are called as functions, and also do the parsing from YAML.
❤️ 1
i
Thanks for all the suggestions! Will implement it with a decorator in the before node run hook
🙌 1
d
Let us know how it goes! It sounds like some other may also be interested in a solution; could be a cool thing to abstract into its own plugin. 🙂
i
@hook_impl    def before_node_run(self, node: Node):        from typeguard import typechecked        node.func = typechecked(node.func)
🙌 2
Fixed it like this in the end ☺️