https://kedro.org/ logo
#questions
Title
# questions
f

Francis Duval

01/23/2024, 6:19 PM
Hello! Specifying a layer for my custom dataset STDataset does not seem to work:
Copy code
model_name:
  type: ibc_codes.datasets.st_dataset.STDataset
  filepath: data/05_models/model_name
  metadata:
    kedro-viz:
      layer: models
Dataset 'model_name' must only contain arguments valid for the constructor of '<http://ibc_codes.datasets.st|ibc_codes.datasets.st>_dataset.STDataset'.
I looked at the documentation to see how I can implement this in my custom dataset class: https://docs.kedro.org/en/stable/data/how_to_create_a_custom_dataset.html#how-to-contribute-a-custom-dataset-implementation, as well as the pandas.CSVDataset source code: https://docs.kedro.org/en/0.18.6/_modules/kedro/datasets/pandas/csv_dataset.html#CSVDataSet, but I found no reference of layers. My custom dataset class:
Copy code
from kedro.io import AbstractDataset
from sentence_transformers import SentenceTransformer


class STDataset(AbstractDataset):
    def __init__(self, filepath: str):
        self.filepath = filepath

    def _load(self):
        return SentenceTransformer(self.filepath)

    def _save(self, model: SentenceTransformer):
        model.save(self.filepath)

    def _describe(self):
        return {
            'filepath': self.filepath
        }
d

datajoely

01/23/2024, 6:29 PM
if you import the class as an object in a notebook can you construct it how you expect?
t

Takieddine Kadiri

01/23/2024, 6:34 PM
Hello ! Maybe you need to init also the AbstractDataset with
super().__init__(...)
f

Francis Duval

01/23/2024, 6:49 PM
Copy code
config_dic = {
    'type': 'ibc_codes.datasets.st_dataset.STDataset',
    'filepath': 'data/05_models/model_name',
    'metadata': {
        'kedro-viz':{
            'layer': 'models'}
    }
}
 
STDataset.from_config(name='test', config=config_dic)
Nope, I get
DatasetError:
STDataset._init_() got an unexpected keyword argument 'metadata'
Dataset 'test' must only contain arguments valid for the constructor of '<http://ibc_codes.datasets.st|ibc_codes.datasets.st>_dataset.STDataset'
But yes, I'll try to do the super init, maybe it's the problem!
Mmm but the parent class, AbstractDataset, does not have any method __init__()
n

Nok Lam Chan

01/23/2024, 10:31 PM
Are you running on an old version of Kedro?
f

Francis Duval

01/23/2024, 11:29 PM
Not the newest one, the one just before it, since I installed it on January 2nd. So yes, I'm running an old version.
n

Nok Lam Chan

01/24/2024, 12:11 PM
@Francis Duval if you are on 0.19.x or the last few 0.18.x versions it should be fine. Maybe you have an outdated version of
kedro-datasets
? the
metadata
field was added a few months ago so maybe it's a version incompatibility issue
f

Francis Duval

01/24/2024, 2:19 PM
I have kedro-datasets 2.0.0 and kedro 0.19.1. I'll try to update them!
n

Nok Lam Chan

01/24/2024, 3:05 PM
This should be sufficient, maybe I am missing something. Did the super().init suggestions above works?
f

Francis Duval

01/24/2024, 3:09 PM
Still not working with the new versions. No I haven't tried this because it seems that the parent class AbstractDataset doesn't have any _init_() method 🤔
n

Nok Lam Chan

01/24/2024, 3:28 PM
Actually, your custom dataset should have a metadata parameter? I don't see it here
Copy code
class STDataset(AbstractDataset):
    def __init__(self, filepath: str):
        self.filepath = filepath
https://docs.kedro.org/projects/kedro-datasets/en/kedro-datasets-2.0.0/_modules/kedro_datasets/pandas/csv_dataset.html#CSVDataset I think your CSVDataset link above is quite an old version (0.18.6), I guess you arrive there through Google, the indexing isn't very good. In Read the Doc you can always choose the version of the documentation
f

Francis Duval

01/24/2024, 3:52 PM
Seems to be the answer, thanks! I think I got lost in the doc... I'll try that soon and tell you if it works.
n

Nok Lam Chan

01/24/2024, 4:02 PM
It's not your problem. I did a quick Google Search and google did lead me to the 0.18.6 doc. Cc @Jo Stichbury I remembered we tried to only index the latest version, maybe we didn't do it after we split our doc into RTD sub-project? I clicked the 1st Google search result: https://www.google.com/search?q=csvsdataset+kedro&oq=csvsdataset+kedro&gs_lcrp=EgZja[…]CQgJEAAYDRiABNIBCDQ1MzhqMGo0qAIAsAIA&sourceid=chrome&ie=UTF-8
K 1