Hello everyone! I'm new to kedro! Been using it fo...
# questions
f
Hello everyone! I'm new to kedro! Been using it for one week. It is easy to log a csv file into the data catalog:
Copy code
casa:
  type: pandas.CSVDataset
  filepath: data/01_raw/casa.csv
  load_args:
    encoding: 'ISO-8859-1'
Do you know how I can log a model? For instance, I need to log a SentenceTransformer model:
st_model = SentenceTransformer('models/finetuned/mpnet_16_26epochs')
So I need something like:
Copy code
st_model:
  type: SentenceTransformerModel
  filepath: data/01_raw/st_model.csv
d
Copy code
type: pickle
filepath: ...
engine: pickle|dill|joblic|cloudpickle
👍 1
f
Hey thanks for you more-than-rapid answer! I just watch you great video!
However, this does not work since my model is not a .pkl file. It is in fact a whole folder containing different files for the weights, the tokenizer, etc.
d
in which case - I think you may have to define a custom dataset I would wrap this SBERT save command in the Kedro custom dataset
save()
method and then contribute it back to Kedro!
Only question I have @Juan Luis would your WIP huggingface stuff support SBERT
f
Thanks for your answer! So I defined this custom dataset class:
Copy code
from kedro.io import AbstractDataset
from sentence_transformers import SentenceTransformer


class STDataset(AbstractDataset):
    def __init__(self, filepath: str):
        self.filepath = filepath

    def _load(self):
        return SentenceTransformer(self.filepath)

    def _save(self, model: SentenceTransformer):
        model.save(self.filepath)

    def _describe(self):
        return {
            "filepath": self.filepath
        }
And then, in the catalog:
Copy code
st_model:
  type: kedro_tuto.datasets.st_dataset.STDataset
  filepath: data/06_models/model_folder
d
is that working for you?
it looks from here
f
Totally working. Thank you so much!
🥳 1
j
d
@Francis Duval - if you have a moment it would be really interesting if you could test if
SentenceTransformers
work with this?
f
Sure! I'll do this today
🙏 1
No, unfortunately, it is not working. I tried this:
Copy code
sentence_transformer_model:
  type: huggingface.HFTransformerPipelineDataset
  model_name: data/06_models/mpnet_16_26epochs
The following may work, but I get an SSL error, and I think this is because my organization prevents us from downloading models directly from HuggingFace/SentenceTransformers.
Copy code
sentence_transformer_model:
  type: huggingface.HFTransformerPipelineDataset
  model_name: sentence-transformers/all-mpnet-base-v2
j
you're not alone @Francis Duval 😅
😮 1