I am trying to define a class called XGBExtended
that extends the class xgboost.XGBClassifier
, the scikit-learn API for xgboost. I am running into some issues with the get_params
method. Below is an IPython session illustrating the issue. Basically, get_params
seems to only be returning the attributes I define within XGBExtended.__init__
, and attributes defined during the parent init method (xgboost.XGBClassifier.__init__
) are ignored. I am using IPython and running python 2.7. Full system specs at bottom.
In [182]: import xgboost as xgb
...:
...: class XGBExtended(xgb.XGBClassifier):
...: def __init__(self, foo):
...: super(XGBExtended, self).__init__()
...: self.foo = foo
...:
...: clf = XGBExtended(foo = 1)
...:
...: clf.get_params()
...:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-182-431c4c3f334b> in <module>()
8 clf = XGBExtended(foo = 1)
9
---> 10 clf.get_params()
/Users/andrewhannigan/lib/xgboost/python-package/xgboost/sklearn.pyc in get_params(self, deep)
188 if isinstance(self.kwargs, dict): # if kwargs is a dict, update params accordingly
189 params.update(self.kwargs)
--> 190 if params['missing'] is np.nan:
191 params['missing'] = None # sklearn doesn't handle nan. see #4725
192 if not params.get('eval_metric', True):
KeyError: 'missing'
So I've hit an error because 'missing' is not a key in the params
dict within the XGBClassifier.get_params
method. I enter the debugger to poke around:
In [183]: %debug
> /Users/andrewhannigan/lib/xgboost/python-package/xgboost/sklearn.py(190)get_params()
188 if isinstance(self.kwargs, dict): # if kwargs is a dict, update params accordingly
189 params.update(self.kwargs)
--> 190 if params['missing'] is np.nan:
191 params['missing'] = None # sklearn doesn't handle nan. see #4725
192 if not params.get('eval_metric', True):
ipdb> params
{'foo': 1}
ipdb> self.__dict__
{'n_jobs': 1, 'seed': None, 'silent': True, 'missing': nan, 'nthread': None, 'min_child_weight': 1, 'random_state': 0, 'kwargs': {}, 'objective': 'binary:logistic', 'foo': 1, 'max_depth': 3, 'reg_alpha': 0, 'colsample_bylevel': 1, 'scale_pos_weight': 1, '_Booster': None, 'learning_rate': 0.1, 'max_delta_step': 0, 'base_score': 0.5, 'n_estimators': 100, 'booster': 'gbtree', 'colsample_bytree': 1, 'subsample': 1, 'reg_lambda': 1, 'gamma': 0}
ipdb>
As you can see, the params
contains only the foo
variable. However, the object itself contains all of the params defined by xgboost.XGBClassifier.__init__
. But for some reason the BaseEstimator.get_params
method which is called from xgboost.XGBClassifier.get_params
is only getting the parameters defined explicitly in the XGBExtended.__init__
method. Unfortunately, even if I explicitly call get_params
with deep = True
, it still does not work correctly:
ipdb> super(XGBModel, self).get_params(deep=True)
{'foo': 1}
ipdb>
Can anyone tell why this is happening?
System specs:
In [186]: print IPython.sys_info()
{'commit_hash': u'1149d1700',
'commit_source': 'installation',
'default_encoding': 'UTF-8',
'ipython_path': '/Users/andrewhannigan/virtualenvironment/nimble_ai/lib/python2.7/site-packages/IPython',
'ipython_version': '5.4.1',
'os_name': 'posix',
'platform': 'Darwin-14.5.0-x86_64-i386-64bit',
'sys_executable': '/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python',
'sys_platform': 'darwin',
'sys_version': '2.7.10 (default, Jul 3 2015, 12:05:53) \n[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)]'}
The problem here is incorrect declaration of child class. When you declare the init method only using
foo
, you are overriding the original one. It will not be initialized automatically, even if the base class constructor is supposed to have default values for them.You should use the following:
After that you will not get any error.