I have survival data with competing risk (censored 0, relaps 1, death 2) and I want to use Grey's test to check the difference between groups of a variable rh (categories 0, 1 and 2). This is my data:
> surv_data
# A tibble: 344 × 4
ID time_to_event status rh
<dbl> <dbl> <dbl> <fct>
1 1 61.2 0 PHLF 0
2 2 6.53 1 PHLF 0
3 10 109. 0 PHLF 0
4 15 15.5 2 PHLF 2
5 19 4.97 0 PHLF 0
6 20 17.9 1 PHLF 1
7 22 61.7 1 PHLF 0
8 23 74.0 2 PHLF 0
9 26 67.9 2 PHLF 1
10 27 46.8 2 PHLF 0
# ℹ 389 more rows
# ℹ Use `print(n = ...)` to see more rows
What is the difference between Gray's test in tidycmprsk::tbl_cuminc and tidycmprsk::cuminc. I get different outputs.
I tried the following below, and I don't understand why I get different outputs of the tests. Hpw should i interpret the results and what is the right way to do Gray's test?
1)
tidycmprsk::cuminc(Surv(time_to_event, status) ~ rh, data = surv_data) %>%
tidycmprsk::tbl_cuminc(
times = c(6, 12, 24), label_header = "**Month {time}**"
) %>%
add_p() %>%
add_n()
and I get output:
`
Characteristic N Month 6 Month 12 Month 24 p-value1
rh 399 0.07
rh 0 32% (15%, 33%) 45% (39%, 50%) 62% (36%, 65%)
rh 1 23% (9%, 28%) 37% (21%, 53%) 46% (38%, 68%)
rh 2 28% (12%, 42%) 27% (15%, 63%) 38% (21%, 55%)
1 Gray’s Test
`
tidycmprsk::cuminc(ftime=surv_data$time_to_event, fstatus=surv_data$status, group=surv_data$rh, cencode=0)
and I get output:
`
Tests:
stat pv df
Censored 2.20168 1.176649e-01 2
Relapse 5.75114 2.376312e-01 2
Death 28.383626 3.183201e-09 2
`
tidycmprsk::cuminc(Surv(time_to_event, status) ~ rh, data = surv_data, cencode=0)
and I get output:
`
• Tests
outcome statistic df p.value
Relapse 4.31 2.00 0.07
Death 32.3 2.00 <0.001
`
You have got yourself into difficulty because you are mixing
tidycmprskandcmprsk. Both of these packages have a function calledcuminc(). Your second code snippet is:However, the docs for
tidycmprsk::cuminc()indicate it either takes a formula or a parameter calledx. The parameters you are providing (ftime,fstatusand so on) are the arguments forcmprsk::cuminc(). As all your parameters are named and none of them isx, this code will generate an error.The
tidycmprsk::cuminc()source callscmprsk::cuminc()under the hood, so once you resolve your argument names you will get the same results. Here's an example with thetrialdata built-in to thetidycmprskpackage, using both approaches.tidycmprskNote that here I am using
death_cras the status variable. As the docs state:Make sure that your data is in this form.
cmprskThis is the equivalent using
cmprsk. Again as we are using the factor variabledeath_cras the status, we can providecencode = "censor", i.e. a character vector indicating the label of the factor level associated with being censored (rather thancencode = 0in your question).You can see that the p-value and statistic are the same.
A general comment on namespaces
Where packages have overlapping functions (these two also have
crr()in common), there are various approaches. In my example I loadedtidycmprskwithlibrary(), and calledcmprskfunctions explicitly with::. This is about the minimum you should do in these cases. The Google R Style Guide in fact suggests that:Of course this advice makes more sense for a huge company with a massive codebase than a single researcher who may not generally face a huge risk of namespace collisions.
Another approach which has many advantages is to use the
boxpackage. This allows you to (amongst other things) explicitly import the functions you want to use from specific packages rather than importing all functions from a package, whichlibrary()does.