Hello Is the `yaml` Loader part of the `ConfigLoader` somewh Kedro #questions

Hello! Is the `yaml` Loader part of the `ConfigLoa...

Filip Panovski

11/17/2022, 10:03 AM

Hello! Is the

yaml

Loader part of the

ConfigLoader

somewhat configurable in any meaningful way? Or does kedro implement its own

yaml

parsing mechanism? We're trying to use some custom filtering that gets passed to the

kedro.extras.datasets.dask.ParquetDataSet

load_args

. Specifically, we want to be able to do something like:

Copy code

# catalog.yml
raw_data:
  type: dask.ParquetDataSet
  filepath: 's3://...'
  load_args:
    filters:
      - !!python/tuple ['year', '=', '2022']
      - !!python/tuple ['day', '=', '3']
      - !!python/tuple ['id', '=', 'someVal']

dask

(via

filters

, see docs) supports row-filtering on loaded data via this way and

yaml

(via tuple support in

.yml

files) supports the above definition. However,

yaml

unfortunately supports this using either the non-default

FullLoader

or the

UnsafeLoader

(for controlled environments, see here). Is it possible to configure the

ConfigLoader

to use either of these? An example use case for this would be to filter only the rows belonging to all

day = 3

partitions of any month in

year = 2022

. I could alternatively write a DataSet that parses this logic from plain string lists, but I was wondering if there's any existing support for something like this.

Filip Panovski

11/17/2022, 2:47 PM

I came back to this after asking this morning and it seems that https://github.com/kedro-org/kedro/issues/1011 is talking about exactly this use case. Unfortunately it is still an open issue, so it does not seem possible to do this at the time. Custom DataSet it is. : )

Filip Panovski

11/17/2022, 2:49 PM

For anyone else stumbling on this: here's a workaround using a custom dataset which was already mentioned Q&A discussion: https://github.com/kedro-org/kedro/discussions/973 I apologize for duplicating the discussion here, it seems my Google-fu was not up to par.

3 Views

Open in Slack

Previous Next