How to reuse pickled objects in python?

805 Views Asked by At

I have pickled some of the objects so that I can reuse them later. For example, I pickled three different gradient boosting regressors that I wanted reuse later. However, when I tried to use transform method for the regressor, python complained that it needs to be fitted first. Below is the code:

models #a list containing three regressors 

joblib.dump(models[0], 'gbm1.pkl')
joblib.dump(models[1], 'gbm2.pkl')
joblib.dump(models[2], 'gbm3.pkl')

Then I reloaded them back in to iPython.

gbm = []

gbm1 = joblib.load('gbm1.pkl')
gbm.append(gbm1)
gbm2 = joblib.load('gbm2.pkl')
gbm.append(gbm2)
gbm3 = joblib.load('gbm3.pkl')
gbm.append(gbm3)

Then I tried to run the transform() method to get a data matrix with most important features.

#get the most important features from gbm1,gbm2,gbm3 (for each target)
train_dict = {} #new training data with most important features
val_dict = {}   #new val data with most important features
for clf,star in zip(gbm,['*','**','***']):
    train_dict[star] = clf.transform(train_X_tfidf)
    val_dic[star] = clf.transform(val_X_tfidf)

However, I am getting the following error (traceback):

NotFittedError                            Traceback (most recent call last)
<ipython-input-37-743077458c48> in <module>()
      3 val_dict = {}   #new val data with most important features
      4 for clf,star in zip(gbm,['*','**','***']):
----> 5     train_dict[star] = clf.transform(train_X_tfidf)
      6     val_dic[star] = clf.transform(val_X_tfidf)
      7 

//anaconda/lib/python2.7/site-packages/sklearn/feature_selection/from_model.pyc in transform(self, X, threshold)
     46         """
     47         check_is_fitted(self, ('coef_', 'feature_importances_'), 
---> 48                         all_or_any=any)
     49 
     50         X = check_array(X, 'csc')

//anaconda/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_is_fitted(estimator, attributes, msg, all_or_any)
    625 
    626     if not all_or_any([hasattr(estimator, attr) for attr in attributes]):
--> 627         raise NotFittedError(msg % {'name': type(estimator).__name__})

NotFittedError: This GradientBoostingRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

I thought if I serialize using pickle, I can reuse it right away after loading it back. What am I doing wrong ?

Thanks for your help.

1

There are 1 best solutions below

0
On

If you used cross-validation, your models may indeed require fitting on the whole dataset, as proposed here