Extending xgboost.XGBClassifier

1.4k Views Asked by At

I am trying to define a class called XGBExtended that extends the class xgboost.XGBClassifier, the scikit-learn API for xgboost. I am running into some issues with the get_params method. Below is an IPython session illustrating the issue. Basically, get_params seems to only be returning the attributes I define within XGBExtended.__init__, and attributes defined during the parent init method (xgboost.XGBClassifier.__init__) are ignored. I am using IPython and running python 2.7. Full system specs at bottom.

In [182]: import xgboost as xgb
     ...: 
     ...: class XGBExtended(xgb.XGBClassifier):
     ...:   def __init__(self, foo):
     ...:     super(XGBExtended, self).__init__()
     ...:     self.foo = foo
     ...: 
     ...: clf = XGBExtended(foo = 1)
     ...: 
     ...: clf.get_params()
     ...: 
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-182-431c4c3f334b> in <module>()
      8 clf = XGBExtended(foo = 1)
      9 
---> 10 clf.get_params()

/Users/andrewhannigan/lib/xgboost/python-package/xgboost/sklearn.pyc in get_params(self, deep)
    188         if isinstance(self.kwargs, dict):  # if kwargs is a dict, update params accordingly
    189             params.update(self.kwargs)
--> 190         if params['missing'] is np.nan:
    191             params['missing'] = None  # sklearn doesn't handle nan. see #4725
    192         if not params.get('eval_metric', True):

KeyError: 'missing'

So I've hit an error because 'missing' is not a key in the params dict within the XGBClassifier.get_params method. I enter the debugger to poke around:

In [183]: %debug
> /Users/andrewhannigan/lib/xgboost/python-package/xgboost/sklearn.py(190)get_params()
    188         if isinstance(self.kwargs, dict):  # if kwargs is a dict, update params accordingly
    189             params.update(self.kwargs)
--> 190         if params['missing'] is np.nan:
    191             params['missing'] = None  # sklearn doesn't handle nan. see #4725
    192         if not params.get('eval_metric', True):

ipdb> params
{'foo': 1}
ipdb> self.__dict__
{'n_jobs': 1, 'seed': None, 'silent': True, 'missing': nan, 'nthread': None, 'min_child_weight': 1, 'random_state': 0, 'kwargs': {}, 'objective': 'binary:logistic', 'foo': 1, 'max_depth': 3, 'reg_alpha': 0, 'colsample_bylevel': 1, 'scale_pos_weight': 1, '_Booster': None, 'learning_rate': 0.1, 'max_delta_step': 0, 'base_score': 0.5, 'n_estimators': 100, 'booster': 'gbtree', 'colsample_bytree': 1, 'subsample': 1, 'reg_lambda': 1, 'gamma': 0}
ipdb> 

As you can see, the params contains only the foo variable. However, the object itself contains all of the params defined by xgboost.XGBClassifier.__init__. But for some reason the BaseEstimator.get_params method which is called from xgboost.XGBClassifier.get_params is only getting the parameters defined explicitly in the XGBExtended.__init__ method. Unfortunately, even if I explicitly call get_params with deep = True, it still does not work correctly:

ipdb> super(XGBModel, self).get_params(deep=True)
{'foo': 1}
ipdb> 

Can anyone tell why this is happening?

System specs:

In [186]: print IPython.sys_info()
{'commit_hash': u'1149d1700',
 'commit_source': 'installation',
 'default_encoding': 'UTF-8',
 'ipython_path': '/Users/andrewhannigan/virtualenvironment/nimble_ai/lib/python2.7/site-packages/IPython',
 'ipython_version': '5.4.1',
 'os_name': 'posix',
 'platform': 'Darwin-14.5.0-x86_64-i386-64bit',
 'sys_executable': '/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python',
 'sys_platform': 'darwin',
 'sys_version': '2.7.10 (default, Jul  3 2015, 12:05:53) \n[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)]'}
1

There are 1 best solutions below

5
On BEST ANSWER

The problem here is incorrect declaration of child class. When you declare the init method only using foo, you are overriding the original one. It will not be initialized automatically, even if the base class constructor is supposed to have default values for them.

You should use the following:

class XGBExtended(xgb.XGBClassifier):
    def __init__(self, foo, max_depth=3, learning_rate=0.1,
                 n_estimators=100, silent=True,
                 objective="binary:logistic",
                 nthread=-1, gamma=0, min_child_weight=1,
                 max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1,
                 reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
                 base_score=0.5, seed=0, missing=None, **kwargs):

        # Pass the required parameters to super class
        super(XGBExtended, self).__init__(max_depth, learning_rate,
                                            n_estimators, silent, objective,
                                            nthread, gamma, min_child_weight,
                                            max_delta_step, subsample,
                                            colsample_bytree, colsample_bylevel,
                                            reg_alpha, reg_lambda,
scale_pos_weight, base_score, seed, missing, **kwargs)

        # Use other custom parameters
        self.foo = foo

After that you will not get any error.

clf = XGBExtended(foo = 1)
print(clf.get_params(deep=True))

>>> {'reg_alpha': 0, 'colsample_bytree': 1, 'silent': True, 
     'colsample_bylevel': 1, 'scale_pos_weight': 1, 'learning_rate': 0.1, 
     'missing': None, 'max_delta_step': 0, 'nthread': -1, 'base_score': 0.5, 
     'n_estimators': 100, 'subsample': 1, 'reg_lambda': 1, 'seed': 0, 
     'min_child_weight': 1, 'objective': 'binary:logistic', 
     'foo': 1, 'max_depth': 3, 'gamma': 0}