Flavien
08/09/2023, 4:45 PMdata/
through the catalog in some "unit tests" (does it make sense?). To do so, I thought about using the test_run
example from the documentation (created from kedro new
) to load the catalog
through the context. Is it the correct way to do it? Thanks in advance.Jose Nuñez
08/09/2023, 6:01 PMFlavien
08/09/2023, 8:11 PMdef filter_out_flagged_data(
hourly_measures: DataFrame, excluded_flags: list[str]
) -> DataFrame:
return hourly_measures.filter(
~arrays_overlap(
col("flags_array"), array([lit(flag) for flag in excluded_flags])
)
).drop(col("flags_array"))
To do so, I would like to use a test dataset from a JSON file which is defined in a catalog through
hourly_measures:
type: spark.SparkDataSet
filepath: data/01_raw/hourly_measures_test.json
file_format: json
load_args:
header: True
multiline: True
It would be a pytest fixture using the catalog to load the data instead of including the JSON/dict content inside the fixture itself.Jose Nuñez
08/10/2023, 11:58 AMNok Lam Chan
08/10/2023, 12:21 PM@pytest.fixture
def context():
session = KedroSession.create(..., env="test")
return session.load_context()
def catalog(context):
return context.catalog
def test_my_dataset(catalog):
dataset = catalog.load("some_json_dataset")
... # do your work here
@Jose Nuñez @FlavienFlavien
08/10/2023, 12:23 PMNok Lam Chan
08/10/2023, 12:24 PM