How to extract the survival rate from a ctree in R?

262 Views Asked by At

The Curve

K <- HF %>% 
  filter(serum_creatinine <= 1.8, ejection_fraction > 25, age > 79)

pred_k <- ctree(Surv(time, DEATH_EVENT) ~ ., data = K)

https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records

I'm wanting to get the survival rate at t=150. This is my first ever Kaplan Meier model. Just don't know how to extract it.

Using summary(pred_k) doesn't return calculations for some reason.

1

There are 1 best solutions below

4
Achim Zeileis On BEST ANSWER

To apply the summary() method to the Kaplan-Meier estimates you need to extract the survfit object first. You can do so either by re-fitting survfit() to all of the terminal nodes of the tree simultaneously. Or, alternatively, by using predict() to obtain the fitted Kaplan-Meier curve for every individual observation.

To make the example easily reproducible, let's consider the following illustration from ?ctree in partykit:

library("partykit")
library("survival")
data("GBSG2", package = "TH.data")
ct <- ctree(Surv(time, cens) ~ ., data = GBSG2)
plot(ct)

plot of ctree survival tree on GBSG2 data

Now we can extract the IDs of the terminal nodes (3, 4, 6, 7) for every observation in the learning data by:

nd <- factor(predict(ct, type = "node"))

And then we can build the corresponding four-group survfit object for the entire learning data and work with it "as usual" in survival.

sf <- survfit(Surv(time, cens) ~ nd, data = GBSG2)
summary(sf, time = 1500)
## Call: survfit(formula = Surv(time, cens) ~ nd, data = GBSG2)
## 
##                 nd=3 
##         time       n.risk      n.event     survival      std.err lower 95% CI upper 95% CI 
##     1.50e+03     8.30e+01     7.70e+01     6.31e-01     3.45e-02     5.67e-01     7.03e-01 
## 
##                 nd=4 
##         time       n.risk      n.event     survival      std.err lower 95% CI upper 95% CI 
##     1.50e+03     6.40e+01     2.30e+01     7.89e-01     3.96e-02     7.15e-01     8.71e-01 
## 
##                 nd=6 
##         time       n.risk      n.event     survival      std.err lower 95% CI upper 95% CI 
##     1.50e+03     2.00e+01     9.80e+01     2.29e-01     4.01e-02     1.62e-01     3.22e-01 
## 
##                 nd=7 
##         time       n.risk      n.event     survival      std.err lower 95% CI upper 95% CI 
##     1.50e+03     4.70e+01     6.80e+01     5.19e-01     4.38e-02     4.40e-01     6.13e-01 
plot(sf, col = 1:4)

plot of survival curves from four-group survfit on GBSG2 data

Alternatively, you can extract the survfit for an individual observation as well, here we extract the object for the first observation in the learning data:

sf1 <- predict(ct, type = "prob")[[1]]
summary(sf1, time = 1500)
## Call: survfit(formula = y ~ 1, weights = w, subset = w > 0)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##  1500     83      77    0.631  0.0345        0.567        0.703
plot(sf1)

plot of the predicted survival curve for the first observation in the GBSG2 data