https://kedro.org/ logo
#questions
Title
# questions
m

Massinissa Saïdi

03/31/2023, 12:29 PM
Hello, is it possible to save a sklearn pipeline object in pickle because I have this error :
Copy code
DataSetError: <class 'sklearn.pipeline.Pipeline'> was not serialised due to: Can't pickle local object 'fit_best_model.<locals>.<lambda>'
I just return a partitioned pickle dataset like that
return {'model_' + parameters['model']: pipeline}
and I define the dataset in catalog.yml like that
Copy code
models_partionned:
  type: PartitionedDataSet
  path: data/06_models/${date}/${target}/
  filename_suffix: ".pkl"
  dataset:
    type: pickle.PickleDataSet
m

marrrcin

03/31/2023, 12:34 PM
Use backend: cloudpickle param for the PickleDataSet (install cloudpickle first) or don't use lambdas in your sklearn Pipeline
m

Massinissa Saïdi

03/31/2023, 12:34 PM
what is lambdas here please ?
🙄 1
d

Deepyaman Datta

03/31/2023, 12:47 PM
@Massinissa Saïdi Did you define a function called
fit_best_model
?
m

Massinissa Saïdi

03/31/2023, 12:51 PM
yes
d

Deepyaman Datta

03/31/2023, 12:52 PM
Can you share the definition? Or at least check if you used
lambda
in there?
m

Massinissa Saïdi

03/31/2023, 12:53 PM
ooh when Marcin said lambda it talk about the lambda function. Yes I used it:
TfidfVectorizer(tokenizer=lambda x: x.split(' '),...
👍 1
d

Deepyaman Datta

03/31/2023, 12:56 PM
You can define a separate function instead, or you may even be able to:
Copy code
from operator import methodcaller

TfidfVectorizer(tokenizer=methodcaller('split', ' '),...
m

Massinissa Saïdi

03/31/2023, 12:56 PM
ok nice thanks 🙂
d

Deepyaman Datta

03/31/2023, 12:58 PM
Also, just minor (unsolicited) notes: 1. Maybe you don't need to pass
' '
argument to
split
? By default,
split
already will separate based on any run of whitespace. Unless you really need it to split on single space. 2.
models_partionned
is spelled wrong (if it's English) 😉
👍 1
m

Massinissa Saïdi

03/31/2023, 1:00 PM
yes i wrote to fast haha thx