Hi kedro Just starting using Kedro and looks really nice so Kedro #questions

Hi kedro! Just starting using Kedro and looks real...

Inger van Boeijen

09/11/2023, 10:01 PM

Hi kedro! Just starting using Kedro and looks really nice so far! Just wondering is there a way to validate the node input types? Also if they are parameters? I do put type hints in the node functions. I was wondering if i can do it through a hook?

Deepyaman Datta

09/12/2023, 4:06 AM

Welcome! What do you mean "validate"? Like, validate the data (using something like Pandera, Great Expectations), or validate that the configured dataset returns a

pd.DataFrame

if the function expects that?

Inger van Boeijen

09/12/2023, 7:21 AM

The second one. For example, I have a paramater max_days that is used in a function def create_checkpoints(max_days: int). I want to make sure that the paramater that is passed is indeed an int and not a string

Iñigo Hidalgo

09/12/2023, 8:23 AM

Watching any replies with interest 😄

datajoely

09/12/2023, 8:45 AM

• For runtime validation I think the best solution is to annotate the function that your node calls with

pa.check_types

decortator • This technically doesn’t really rely on Kedro for anything but calling the function and therefore it’s somewhat independent of the catalog • The pattern I like most:

Copy code

nodes
|_ sales_nodes.py <- Where you declare your nodes
|_ schemas <- where you store your schema classes to import
   |_ customer_schemas.py 
   |_ product_schemas.py

datajoely

09/12/2023, 8:46 AM

Longer term we’re working on a deeper, more kedro native integration https://kedro-org.slack.com/archives/C03RKAQ0MGQ/p1693825200938839

Deepyaman Datta

09/12/2023, 2:24 PM

@datajoely I think this question is about object type validation, though (e.g. the type of a parameter, or whether something is pandas DataFrame vs. Spark DataFrame).

Iñigo Hidalgo

09/12/2023, 2:26 PM

@Deepyaman Datta yeah, basically

mypy

-type validation instead of

pandera

Deepyaman Datta

09/12/2023, 2:36 PM

@Inger van Boeijen You can validate this using hooks, at runtime. In a

before_node_run

hook, you could get the

node._func

, use

inspect

to get the argument types, and make sure they match. In doing a quick search, my suggestion is pretty in line with an old answer: https://stackoverflow.com/a/19684962/1093967 There may be a more modern way, but probably also not wholly necessary. (You can also probably do it

before_pipeline_run

, and iterate through all of the nodes and check parameters.) Both of the above options should be fairly simple to implement. If you want to validate the types without running the code, like

mypy

, I'm sure you'd need to build some sort of extension/this would go beyond hooks. Basically, some sort of static analysis checker that can understand how nodes are called as functions, and also do the parsing from YAML.

❤️ 1

Inger van Boeijen

09/12/2023, 2:40 PM

Thanks for all the suggestions! Will implement it with a decorator in the before node run hook

🙌 1

Deepyaman Datta

09/12/2023, 2:43 PM

Let us know how it goes! It sounds like some other may also be interested in a solution; could be a cool thing to abstract into its own plugin. 🙂

Inger van Boeijen

09/15/2023, 9:20 AM

@hook_impl def before_node_run(self, node: Node): from typeguard import typechecked node.func = typechecked(node.func)

🙌 2

Inger van Boeijen

09/15/2023, 9:20 AM

Fixed it like this in the end ☺️

5 Views

Open in Slack

Previous Next