Dustin van Weersel
01/24/2024, 3:16 PMafter_node_run
hook. Within this hook, I would like to load the config that our GE implementation needs and pass the output dataset name. Does this make sense or would anyone suggest a different method?
Currently, I've implemented the config as a parameter because I would like to use the templating function of the config loader. However, now I'm running into the fact I would need to add that parameter to every function and function definition. Is there a global object I could attach this to so I could only load it once? Do properties of a hook persist throughout a session?
Any help or advice would be appreciated 🙂.Nok Lam Chan
01/24/2024, 3:25 PMNok Lam Chan
01/24/2024, 3:25 PMDustin van Weersel
01/24/2024, 4:03 PMfrom own_package import RunGECheck
class GEHook:
@hook_impl
def after_context_created(
self,
context: KedroContext
) -> None:
config_loader = context.config_loader
ge_config = config_loader["parameters"]["great_expectation"]
self.ge_config = ge_config
pass
@hook_impl
def after_node_run (
self,
node: Node,
catalog: DataCatalog,
inputs: dict[str, Any],
outputs: dict[str, Any],
is_async: bool,
session_id: str,
) -> None:
output_dataset_name: str = # Get from input
output_dataset: SparkDataFrame = # Get from input
# Class uses dataframe + dataset name to retrieve GE suites and run them. Also stores results in a given path
RunGECheck(
dataset_name=output_dataset_name,
config=self.config,
dataframe=output_dataset
)
pass
Nok Lam Chan
01/24/2024, 4:04 PMself.config
should be <http://self.ge|self.ge>_config
in this context I guess?Dustin van Weersel
01/24/2024, 4:04 PMNok Lam Chan
01/24/2024, 4:09 PMDustin van Weersel
01/24/2024, 4:13 PMNok Lam Chan
01/24/2024, 4:21 PM