Hi kedro! Just starting using Kedro and looks real...
# questions
Hi kedro! Just starting using Kedro and looks really nice so far! Just wondering is there a way to validate the node input types? Also if they are parameters? I do put type hints in the node functions. I was wondering if i can do it through a hook?
Welcome! What do you mean "validate"? Like, validate the data (using something like Pandera, Great Expectations), or validate that the configured dataset returns a
if the function expects that?
The second one. For example, I have a paramater max_days that is used in a function def create_checkpoints(max_days: int). I want to make sure that the paramater that is passed is indeed an int and not a string
Watching any replies with interest 😄
• For runtime validation I think the best solution is to annotate the function that your node calls with
decortator • This technically doesn’t really rely on Kedro for anything but calling the function and therefore it’s somewhat independent of the catalog • The pattern I like most:
Copy code
|_ sales_nodes.py <- Where you declare your nodes
|_ schemas <- where you store your schema classes to import
   |_ customer_schemas.py 
   |_ product_schemas.py
Longer term we’re working on a deeper, more kedro native integration https://kedro-org.slack.com/archives/C03RKAQ0MGQ/p1693825200938839
@datajoely I think this question is about object type validation, though (e.g. the type of a parameter, or whether something is pandas DataFrame vs. Spark DataFrame).
@Deepyaman Datta yeah, basically
-type validation instead of
@Inger van Boeijen You can validate this using hooks, at runtime. In a
hook, you could get the
, use
to get the argument types, and make sure they match. In doing a quick search, my suggestion is pretty in line with an old answer: https://stackoverflow.com/a/19684962/1093967 There may be a more modern way, but probably also not wholly necessary. (You can also probably do it
, and iterate through all of the nodes and check parameters.) Both of the above options should be fairly simple to implement. If you want to validate the types without running the code, like
, I'm sure you'd need to build some sort of extension/this would go beyond hooks. Basically, some sort of static analysis checker that can understand how nodes are called as functions, and also do the parsing from YAML.
❤️ 1
Thanks for all the suggestions! Will implement it with a decorator in the before node run hook
🙌 1
Let us know how it goes! It sounds like some other may also be interested in a solution; could be a cool thing to abstract into its own plugin. 🙂
@hook_impl    def before_node_run(self, node: Node):        from typeguard import typechecked        node.func = typechecked(node.func)
🙌 2
Fixed it like this in the end ☺️