Get survival and hazard scores using XGBoost AFT implementation

443 Views Asked by At

I am using https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html documentation to build a model for survival analysis using XGBoost. I have followed the example given in the documentation for training the model. However, on using xgb.predict(dtest) I am getting a single value for each observation. This value is the time to event for each observation.

Now my question is that, is there a way to get the probability scores against these time periods? What is the value of survival & hazard for these observations? I have also used the lifelines package to predict hazard and it outputs a matrix of probabilities for each time window which is interpretable & actionable. In xgboost it is unclear if it is possible to derive this information. Please help.

dtrain = xgb.DMatrix(X)

#mlmodel.E_train is the boolean event indicator
#mlmodel.T_train is the time elapsed when event is either observed or censored
y_lower_bound=np.where(mlmodel.E_train==1,mlmodel.T_train,0)
y_upper_bound=np.where(mlmodel.E_train==1,mlmodel.T_train,np.inf)

dtrain.set_float_info('label_lower_bound', y_lower_bound)
dtrain.set_float_info('label_upper_bound', y_upper_bound)

params = {'objective': 'survival:aft',
      'eval_metric': 'aft-nloglik',
      'aft_loss_distribution': 'normal',
      'aft_loss_distribution_scale': 1.20,
      'tree_method': 'hist', 'learning_rate': 0.05, 'max_depth': 2}

bst = xgb.train(params, dtrain, num_boost_round=5, evals=[(dtrain, 'train')])


mlmodel.model=bst

dtest = xgb.DMatrix(self.X_test)
mlmodel.y_pred_hazard=pd.DataFrame(mlmodel.model.predict(dtest))
0

There are 0 best solutions below