I'm having some trouble using coxph(). I've two categorical variables:"tecnologia" and "pais", and I want to evaluate the possible interaction effect of "pais" on "tecnologia"."tecnologia" is a variable factor with 2 levels: gps and convencional. And "pais" as 2 levels: PT and ES. I have no idea why this warning keeps appearing. Here's the code and the output:
cox_AC<-coxph(Surv(dados_temp$dias_seg,dados_temp$status)~tecnologia*pais,data=dados_temp)
Warning message:
In coxph(Surv(dados_temp$dias_seg, dados_temp$status) ~ tecnologia *  :
  X matrix deemed to be singular; variable 3
> cox_AC
Call:
coxph(formula = Surv(dados_temp$dias_seg, dados_temp$status) ~ 
    tecnologia * pais, data = dados_temp)
                       coef exp(coef) se(coef)     z     p
tecnologiagps        -0.152     0.859    0.400 -0.38 7e-01
paisPT                1.469     4.345    0.406  3.62 3e-04
tecnologiagps:paisPT     NA        NA    0.000    NA    NA
Likelihood ratio test=23.8  on 2 df, p=6.82e-06  n= 127, number of events= 64 
I'm opening another question about this subject, although I made a similar one some months ago, because I'm facing the same problem again, with other data. And this time I'm sure it's not a data related problem.
Can somebody help me? Thank you
UPDATE: The problem does not seem to be a perfect classification
> xtabs(~status+tecnologia,data=dados)  
      tecnologia
status conv doppler gps  
     0   39       6  24  
     1   30       3  34 
> xtabs(~status+pais,data=dados)  
      pais  
status ES PT  
     0 71  8  
     1 49 28  
 > xtabs(~tecnologia+pais,data=dados)
          pais  
tecnologia ES PT
   conv    69  0
   doppler  1  8
   gps     30 28
 
                        
Here's a simple example which seems to reproduce your problem:
Now lets look for 'perfect classification' like so:
Note that a value of
1forpa1exactly predicts having a statuss1equal to0. That is to say, based on your data, if you know thatpa1==1then you can be sure thans1==0. Thus fitting Cox's model is not appropriate in this setting and will result in numerical errors. This can be seen withgiving
It's important to look at these cross tables before fitting models. Also it's worth starting with simpler models before considering those involving interactions.
If we add the interaction term to
df1manually like this:Then check it with
We can see that it's a useless classifier, i.e. it does not help predict status
s1.When combining all 3 terms, the fitter does manage to produce a numerical value for
te1andpe1even thoughpe1is a perfect predictor as above. However a look at the values for the coefficients and their errors shows them to be implausible.Edit @JMarcelino: If you look at the warning message from the first
coxphmodel in the example, you'll see the warning message:Which is likely the same error you're getting and is due to this problem of classification. Also, your third cross table
xtabs(~ tecnologia+pais, data=dados)is not as important as the table ofstatusbyinteraction term. You could add the interaction term manually first as in the example above then check the cross table. Or you could say:That said, I notice one of the cells in your third table has a zero (
conv,PT) meaning you have no observations with this combination of predictors. This is going to cause problems when trying to fit.In general, the outcome should be have some values for all levels of the predictors and the predictors should not classify the outcome as exactly all or nothing or 50/50.
Edit 2 @user75782131 Yes, generally speaking
xtabsor a similar cross-table should be performed in models where the outcome and predictors are discrete i.e. have a limited no. of levels. If 'perfect classification' is present then a predictive model / regression may not be appropriate. This is true for example for logistic regression (outcome is binary) as well as Cox's model.