Hi ! I want to modify a small thing of the load me...
# questions
a
Hi ! I want to modify a small thing of the load method of a specific Kedro Dataset. Is there a way to achieve this without needing to create a new Custom Data Set?
d
The easiest thing I would recommend is creating a custom dataset that subclasses the dataset you want to modify. You can just override load. If you really don't want to create a custom dataset, you can monkeypatch the dataset (i.e. using
unittest.mock.patch
).
πŸ‘ 1
n
Or you can just monkeypatch it, simple as dataset._load = your__load_func
d
Or you can just monkeypatch it, simple as dataset._load = your__load_func
(If you're using the dataset instance programmatically; won't work e.g. if using the YAML configuration) I guess maybe you could do the above in a before load hook...? But that's hacky AF. I'd do proper monkeypatching if really don't want to subclass. πŸ™ˆ
a
Yes I want to use the YAML configuration, so I agree the best way is to subclass it.
I also have a few questions if maybe you can guide me πŸ˜„ 1. When I subclass it, I only need to define load again right? The rest of the methods will stil work the same? 2. Is this code supposed to go in src/extras/datasets ? (As in the docs of a new dataset?) 3. If I want the new load to use a new input that will come from the catalog.yml file, how can I achieve this?
m
1. Yes 2. It doesn’t matter, you just have to use full reference to the dataset class in the catalog 3. Just specify your dataset in the
type:
field (see 2.)
πŸ‘ 2
a
Hi super clear, thanks! For point N3, I was actually referring to something different. In the calog.yml file: my_catalog: type: my_new_dataset ... (other inputs that already existed in the original dataset) some_new_input: "blabla" My question is how can I use some new input that didnt exist in the Dataset that I was subclassing it.
d
My question is how can I use some new input that didnt exist in the Dataset that I was subclassing it.
You need to overwrite the
__init__
method, if it's a new dataset argument (like
filepath
). If it is a load and save argument, it's easier, because those data structures are already dicts, so flexible for accepting new values.
πŸ‘ 1
a
Yes it is a new dataset argument like filepath. Thanks for the answers, will try it πŸ™‚