I'm having some trouble using coxph(). I've two categorical variables:"tecnologia" and "pais", and I want to evaluate the possible interaction effect of "pais" on "tecnologia"."tecnologia" is a variable factor with 2 levels: gps and convencional. And "pais" as 2 levels: PT and ES. I have no idea why this warning keeps appearing. Here's the code and the output:
cox_AC<-coxph(Surv(dados_temp$dias_seg,dados_temp$status)~tecnologia*pais,data=dados_temp)
Warning message:
In coxph(Surv(dados_temp$dias_seg, dados_temp$status) ~ tecnologia * :
X matrix deemed to be singular; variable 3
> cox_AC
Call:
coxph(formula = Surv(dados_temp$dias_seg, dados_temp$status) ~
tecnologia * pais, data = dados_temp)
coef exp(coef) se(coef) z p
tecnologiagps -0.152 0.859 0.400 -0.38 7e-01
paisPT 1.469 4.345 0.406 3.62 3e-04
tecnologiagps:paisPT NA NA 0.000 NA NA
Likelihood ratio test=23.8 on 2 df, p=6.82e-06 n= 127, number of events= 64
I'm opening another question about this subject, although I made a similar one some months ago, because I'm facing the same problem again, with other data. And this time I'm sure it's not a data related problem.
Can somebody help me? Thank you
UPDATE: The problem does not seem to be a perfect classification
> xtabs(~status+tecnologia,data=dados)
tecnologia
status conv doppler gps
0 39 6 24
1 30 3 34
> xtabs(~status+pais,data=dados)
pais
status ES PT
0 71 8
1 49 28
> xtabs(~tecnologia+pais,data=dados)
pais
tecnologia ES PT
conv 69 0
doppler 1 8
gps 30 28
Here's a simple example which seems to reproduce your problem:
Now lets look for 'perfect classification' like so:
Note that a value of
1
forpa1
exactly predicts having a statuss1
equal to0
. That is to say, based on your data, if you know thatpa1==1
then you can be sure thans1==0
. Thus fitting Cox's model is not appropriate in this setting and will result in numerical errors. This can be seen withgiving
It's important to look at these cross tables before fitting models. Also it's worth starting with simpler models before considering those involving interactions.
If we add the interaction term to
df1
manually like this:Then check it with
We can see that it's a useless classifier, i.e. it does not help predict status
s1
.When combining all 3 terms, the fitter does manage to produce a numerical value for
te1
andpe1
even thoughpe1
is a perfect predictor as above. However a look at the values for the coefficients and their errors shows them to be implausible.Edit @JMarcelino: If you look at the warning message from the first
coxph
model in the example, you'll see the warning message:Which is likely the same error you're getting and is due to this problem of classification. Also, your third cross table
xtabs(~ tecnologia+pais, data=dados)
is not as important as the table ofstatus
byinteraction term
. You could add the interaction term manually first as in the example above then check the cross table. Or you could say:That said, I notice one of the cells in your third table has a zero (
conv
,PT
) meaning you have no observations with this combination of predictors. This is going to cause problems when trying to fit.In general, the outcome should be have some values for all levels of the predictors and the predictors should not classify the outcome as exactly all or nothing or 50/50.
Edit 2 @user75782131 Yes, generally speaking
xtabs
or a similar cross-table should be performed in models where the outcome and predictors are discrete i.e. have a limited no. of levels. If 'perfect classification' is present then a predictive model / regression may not be appropriate. This is true for example for logistic regression (outcome is binary) as well as Cox's model.