I have pickled some of the objects so that I can reuse them later. For example, I pickled three different gradient boosting regressors that I wanted reuse later. However, when I tried to use transform method for the regressor, python complained that it needs to be fitted first. Below is the code:
models #a list containing three regressors
joblib.dump(models[0], 'gbm1.pkl')
joblib.dump(models[1], 'gbm2.pkl')
joblib.dump(models[2], 'gbm3.pkl')
Then I reloaded them back in to iPython.
gbm = []
gbm1 = joblib.load('gbm1.pkl')
gbm.append(gbm1)
gbm2 = joblib.load('gbm2.pkl')
gbm.append(gbm2)
gbm3 = joblib.load('gbm3.pkl')
gbm.append(gbm3)
Then I tried to run the transform() method to get a data matrix with most important features.
#get the most important features from gbm1,gbm2,gbm3 (for each target)
train_dict = {} #new training data with most important features
val_dict = {} #new val data with most important features
for clf,star in zip(gbm,['*','**','***']):
train_dict[star] = clf.transform(train_X_tfidf)
val_dic[star] = clf.transform(val_X_tfidf)
However, I am getting the following error (traceback):
NotFittedError Traceback (most recent call last)
<ipython-input-37-743077458c48> in <module>()
3 val_dict = {} #new val data with most important features
4 for clf,star in zip(gbm,['*','**','***']):
----> 5 train_dict[star] = clf.transform(train_X_tfidf)
6 val_dic[star] = clf.transform(val_X_tfidf)
7
//anaconda/lib/python2.7/site-packages/sklearn/feature_selection/from_model.pyc in transform(self, X, threshold)
46 """
47 check_is_fitted(self, ('coef_', 'feature_importances_'),
---> 48 all_or_any=any)
49
50 X = check_array(X, 'csc')
//anaconda/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_is_fitted(estimator, attributes, msg, all_or_any)
625
626 if not all_or_any([hasattr(estimator, attr) for attr in attributes]):
--> 627 raise NotFittedError(msg % {'name': type(estimator).__name__})
NotFittedError: This GradientBoostingRegressor instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
I thought if I serialize using pickle, I can reuse it right away after loading it back. What am I doing wrong ?
Thanks for your help.
If you used cross-validation, your models may indeed require fitting on the whole dataset, as proposed here