Hi fellows I would like to use example datasets defined in ` Kedro #questions

Hi fellows, I would like to use example datasets d...

Flavien

08/09/2023, 4:45 PM

Hi fellows, I would like to use example datasets defined in

data/

through the catalog in some "unit tests" (does it make sense?). To do so, I thought about using the

test_run

example from the documentation (created from

kedro new

) to load the

catalog

through the context. Is it the correct way to do it? Thanks in advance.

Jose Nuñez

08/09/2023, 6:01 PM

Hi Flavien! Can you give an example of exactly what you want to test in your data? Maybe you can use Hooks + Great Expectations instead?

Flavien

08/09/2023, 8:11 PM

Let's say I want to unit test such a node/function

Copy code

def filter_out_flagged_data(
    hourly_measures: DataFrame, excluded_flags: list[str]
) -> DataFrame:
    return hourly_measures.filter(
        ~arrays_overlap(
            col("flags_array"), array([lit(flag) for flag in excluded_flags])
        )
    ).drop(col("flags_array"))

To do so, I would like to use a test dataset from a JSON file which is defined in a catalog through

Copy code

hourly_measures:
  type: spark.SparkDataSet
  filepath: data/01_raw/hourly_measures_test.json
  file_format: json
  load_args:
    header: True
    multiline: True

It would be a pytest fixture using the catalog to load the data instead of including the JSON/dict content inside the fixture itself.

Jose Nuñez

08/10/2023, 11:58 AM

@Nok Lam Chan how would you approach this?

Nok Lam Chan

08/10/2023, 12:21 PM

so what I will do is having a catalog fixture (syntax may not be exactly correct I am just typing it out here)

Copy code

@pytest.fixture
def context():
   session = KedroSession.create(..., env="test")
   return session.load_context()

def catalog(context):
   return context.catalog

def test_my_dataset(catalog):
   dataset = catalog.load("some_json_dataset")
   ... # do your work here

@Jose Nuñez @Flavien

👍 1

Flavien

08/10/2023, 12:23 PM

Cool, that's what I had in mind. Thanks for the confirmation!

Nok Lam Chan

08/10/2023, 12:24 PM

Meanwhile there are some discussion to create a Testing client to provide a python API to manipulate Kedro project in a easier way - very early discussion right now, not sure if the community need this.

4 Views

Open in Slack

Previous Next