Calculating confidence intervals for coefficients in scikit-survival

1k Views Asked by At

I'm trying out Cox proportional hazards in python using scikit-survival and I was wondering if it's possible to calculate standard errors or confidence intervals for the log hazard coefficients?

Python code (largely lifted from the tutorial on github - https://nbviewer.jupyter.org/github/sebp/scikit-survival/blob/master/examples/00-introduction.ipynb):

from sksurv.datasets import load_veterans_lung_cancer
from sksurv.preprocessing import OneHotEncoder
import sksurv.linear_model as sks
import pandas as pd

data_x, data_y = load_veterans_lung_cancer()
data_x_n = OneHotEncoder().fit_transform(data_x)
est = sks.CoxPHSurvivalAnalysis()
est.fit(data_x_n, data_y)
print(pd.Series(est.coef_, index=data_x_n.columns).sort_values(ascending=False))

Output

Treatment=test           0.289936
Prior_therapy=yes        0.072327
Months_from_Diagnosis   -0.000092
Age_in_years            -0.008549
Karnofsky_score         -0.032622
Celltype=smallcell      -0.331813
Celltype=large          -0.788672
Celltype=squamous       -1.188299
dtype: float64

If I run the same analysis in R using the survival library:

library('Survival')

model = coxph(
  Surv(Survival_in_days, Status) ~ 
    Age_in_years + 
    Celltype.large + 
    Celltype.smallcell + 
    Celltype.squamous + 
    Karnofsky_score + 
    Months_from_Diagnosis + 
    Prior_therapy.yes + 
    Treatment.test,
  data = data_s,
  ties = "breslow"
  )
print(model)

This is the output:

Call:
coxph(formula = Surv(Survival_in_days, Status) ~ Age_in_years + 
    Celltype.large + Celltype.smallcell + Celltype.squamous + 
    Karnofsky_score + Months_from_Diagnosis + Prior_therapy.yes + 
    Treatment.test, data = data_s, ties = "breslow")

                           coef exp(coef)  se(coef)     z       p
Age_in_years          -0.008549  0.991487  0.009304 -0.92  0.3582
Celltype.large        -0.788671  0.454448  0.302668 -2.61  0.0092
Celltype.smallcell    -0.331813  0.717622  0.275590 -1.20  0.2286
Celltype.squamous     -1.188299  0.304739  0.300763 -3.95 7.8e-05
Karnofsky_score       -0.032622  0.967905  0.005505 -5.93 3.1e-09
Months_from_Diagnosis -0.000092  0.999908  0.009125 -0.01  0.9920
Prior_therapy.yes      0.072327  1.075006  0.232132  0.31  0.7554
Treatment.test         0.289936  1.336342  0.207210  1.40  0.1617

Likelihood ratio test=61.4  on 8 df, p=2.46e-10
n= 137, number of events= 128 

The coefficients are the same, but I'd really like a way to calculate the standard error (labelled se(coef) in the R output) or the confidence intervals for each coefficient.

Thanks very much!

0

There are 0 best solutions below