https://kedro.org/ logo
#questions
Title
# questions
d

Dustin van Weersel

01/24/2024, 3:16 PM
Hey! I keep running into technical difficulties implementing a feature I want, so I'd thought I'd ask around here for help. What I would like to do is run Great Expectation checks on the resulting data of a node. For this, I created an
after_node_run
hook. Within this hook, I would like to load the config that our GE implementation needs and pass the output dataset name. Does this make sense or would anyone suggest a different method? Currently, I've implemented the config as a parameter because I would like to use the templating function of the config loader. However, now I'm running into the fact I would need to add that parameter to every function and function definition. Is there a global object I could attach this to so I could only load it once? Do properties of a hook persist throughout a session? Any help or advice would be appreciated 🙂.
n

Nok Lam Chan

01/24/2024, 3:25 PM
could you show the pseudo code of how it works?
d

Dustin van Weersel

01/24/2024, 4:03 PM
Thanks for the quick reply 🙂 With the statefulness of the hooks in mind, a rough pseudocode would look something like this:
Copy code
from own_package import RunGECheck
class GEHook:

    @hook_impl
    def after_context_created(
        self,
        context: KedroContext
    ) -> None:
        config_loader = context.config_loader
        ge_config = config_loader["parameters"]["great_expectation"]

        self.ge_config = ge_config
        pass
        
    @hook_impl
    def after_node_run (
        self,
        node: Node,
        catalog: DataCatalog,
        inputs: dict[str, Any],
        outputs: dict[str, Any],
        is_async: bool,
        session_id: str,
    ) -> None:
        
    output_dataset_name: str = # Get from input
    output_dataset: SparkDataFrame = # Get from input

    # Class uses dataframe + dataset name to retrieve GE suites and run them. Also stores results in a given path
    RunGECheck(
        dataset_name=output_dataset_name,
        config=self.config,
        dataframe=output_dataset
    )
    pass
n

Nok Lam Chan

01/24/2024, 4:04 PM
I think this should work,
self.config
should be
<http://self.ge|self.ge>_config
in this context I guess?
d

Dustin van Weersel

01/24/2024, 4:04 PM
Yes!
n

Nok Lam Chan

01/24/2024, 4:09 PM
awesome, if you are interested, we want to publish blog post using GE with Kedro. We used to have an example GE hooks but it hasn't been updated for a while (maybe it can give you some inspiration for your hooks too)
d

Dustin van Weersel

01/24/2024, 4:13 PM
Thanks for your help 🙂 Once I have everything working I'll look into sharing a working example or cooperate on a blog
n

Nok Lam Chan

01/24/2024, 4:21 PM
Great :) shout if you run into more issues