Massinissa Saïdi
03/31/2023, 12:29 PMDataSetError: <class 'sklearn.pipeline.Pipeline'> was not serialised due to: Can't pickle local object 'fit_best_model.<locals>.<lambda>'
I just return a partitioned pickle dataset like that return {'model_' + parameters['model']: pipeline} and I define the dataset in catalog.yml like that
models_partionned:
type: PartitionedDataSet
path: data/06_models/${date}/${target}/
filename_suffix: ".pkl"
dataset:
type: pickle.PickleDataSetmarrrcin
03/31/2023, 12:34 PMMassinissa Saïdi
03/31/2023, 12:34 PMDeepyaman Datta
03/31/2023, 12:47 PMfit_best_model?Massinissa Saïdi
03/31/2023, 12:51 PMDeepyaman Datta
03/31/2023, 12:52 PMlambda in there?Massinissa Saïdi
03/31/2023, 12:53 PMTfidfVectorizer(tokenizer=lambda x: x.split(' '),...Deepyaman Datta
03/31/2023, 12:56 PMfrom operator import methodcaller
TfidfVectorizer(tokenizer=methodcaller('split', ' '),...Massinissa Saïdi
03/31/2023, 12:56 PMDeepyaman Datta
03/31/2023, 12:58 PM' ' argument to split? By default, split already will separate based on any run of whitespace. Unless you really need it to split on single space.
2. models_partionned is spelled wrong (if it's English) 😉Massinissa Saïdi
03/31/2023, 1:00 PM