Hello is it possible to save a sklearn pipeline object in pi Kedro #questions

Hello, is it possible to save a sklearn pipeline o...

Massinissa Saïdi

03/31/2023, 12:29 PM

Hello, is it possible to save a sklearn pipeline object in pickle because I have this error :

Copy code

DataSetError: <class 'sklearn.pipeline.Pipeline'> was not serialised due to: Can't pickle local object 'fit_best_model.<locals>.<lambda>'

I just return a partitioned pickle dataset like that

return {'model_' + parameters['model']: pipeline}

and I define the dataset in catalog.yml like that

Copy code

models_partionned:
  type: PartitionedDataSet
  path: data/06_models/${date}/${target}/
  filename_suffix: ".pkl"
  dataset:
    type: pickle.PickleDataSet

marrrcin

03/31/2023, 12:34 PM

Use backend: cloudpickle param for the PickleDataSet (install cloudpickle first) or don't use lambdas in your sklearn Pipeline

Massinissa Saïdi

03/31/2023, 12:34 PM

what is lambdas here please ?

🙄 1

Deepyaman Datta

03/31/2023, 12:47 PM

@Massinissa Saïdi Did you define a function called

fit_best_model

Massinissa Saïdi

03/31/2023, 12:51 PM

yes

Deepyaman Datta

03/31/2023, 12:52 PM

Can you share the definition? Or at least check if you used

lambda

in there?

Massinissa Saïdi

03/31/2023, 12:53 PM

ooh when Marcin said lambda it talk about the lambda function. Yes I used it:

TfidfVectorizer(tokenizer=lambda x: x.split(' '),...

👍 1

Deepyaman Datta

03/31/2023, 12:56 PM

You can define a separate function instead, or you may even be able to:

Copy code

from operator import methodcaller

TfidfVectorizer(tokenizer=methodcaller('split', ' '),...

Massinissa Saïdi

03/31/2023, 12:56 PM

ok nice thanks 🙂

Deepyaman Datta

03/31/2023, 12:58 PM

Also, just minor (unsolicited) notes: 1. Maybe you don't need to pass

' '

argument to

split

? By default,

split

already will separate based on any run of whitespace. Unless you really need it to split on single space. 2.

models_partionned

is spelled wrong (if it's English) 😉

👍 1

Massinissa Saïdi

03/31/2023, 1:00 PM

yes i wrote to fast haha thx

30 Views

Open in Slack

Previous Next