Hi fellows, I would like to use example datasets d...
# questions
Hi fellows, I would like to use example datasets defined in
through the catalog in some "unit tests" (does it make sense?). To do so, I thought about using the
example from the documentation (created from
kedro new
) to load the
through the context. Is it the correct way to do it? Thanks in advance.
Hi Flavien! Can you give an example of exactly what you want to test in your data? Maybe you can use Hooks + Great Expectations instead?
Let's say I want to unit test such a node/function
Copy code
def filter_out_flagged_data(
    hourly_measures: DataFrame, excluded_flags: list[str]
) -> DataFrame:
    return hourly_measures.filter(
            col("flags_array"), array([lit(flag) for flag in excluded_flags])
To do so, I would like to use a test dataset from a JSON file which is defined in a catalog through
Copy code
  type: spark.SparkDataSet
  filepath: data/01_raw/hourly_measures_test.json
  file_format: json
    header: True
    multiline: True
It would be a pytest fixture using the catalog to load the data instead of including the JSON/dict content inside the fixture itself.
@Nok Lam Chan how would you approach this?
so what I will do is having a catalog fixture (syntax may not be exactly correct I am just typing it out here)
Copy code
def context():
   session = KedroSession.create(..., env="test")
   return session.load_context()

def catalog(context):
   return context.catalog

def test_my_dataset(catalog):
   dataset = catalog.load("some_json_dataset")
   ... # do your work here
@Jose Nuñez @Flavien
👍 1
Cool, that's what I had in mind. Thanks for the confirmation!
Meanwhile there are some discussion to create a Testing client to provide a python API to manipulate Kedro project in a easier way - very early discussion right now, not sure if the community need this.