I am using https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html documentation to build a model for survival analysis using XGBoost. I have followed the example given in the documentation for training the model. However, on using xgb.predict(dtest)
I am getting a single value for each observation. This value is the time to event for each observation.
Now my question is that, is there a way to get the probability scores against these time periods? What is the value of survival & hazard for these observations? I have also used the lifelines package to predict hazard and it outputs a matrix of probabilities for each time window which is interpretable & actionable. In xgboost it is unclear if it is possible to derive this information. Please help.
dtrain = xgb.DMatrix(X)
#mlmodel.E_train is the boolean event indicator
#mlmodel.T_train is the time elapsed when event is either observed or censored
y_lower_bound=np.where(mlmodel.E_train==1,mlmodel.T_train,0)
y_upper_bound=np.where(mlmodel.E_train==1,mlmodel.T_train,np.inf)
dtrain.set_float_info('label_lower_bound', y_lower_bound)
dtrain.set_float_info('label_upper_bound', y_upper_bound)
params = {'objective': 'survival:aft',
'eval_metric': 'aft-nloglik',
'aft_loss_distribution': 'normal',
'aft_loss_distribution_scale': 1.20,
'tree_method': 'hist', 'learning_rate': 0.05, 'max_depth': 2}
bst = xgb.train(params, dtrain, num_boost_round=5, evals=[(dtrain, 'train')])
mlmodel.model=bst
dtest = xgb.DMatrix(self.X_test)
mlmodel.y_pred_hazard=pd.DataFrame(mlmodel.model.predict(dtest))