How do I completely exclude a variable category in R using subset or any other function?

120 Views Asked by At

I'm trying to create a contingency table to perform a chi square test. I have a variable (group) with 3 categories (N, Y, Unknown) - I want to exclude one of these categories (Unknown) which has very small numbers so I can create a 2x2 contingency table with another variable (parous).

I tried using the subset function, as well as filter from dplyr - my excluded category is still included with a count of '0' - I can't seem to get rid of it! I can't run a chi-square - I'm assuming because of the count of '0' in the Unknown column. How do I exclude this column completely?

My code is as follows:

data$group <- factor(data$group, 
                 levels=c("No", "Yes", "Unknown"), 
                 labels=c("LR", "HR", "Unknown"))
data2<- subset(data, !(group=="Unknown"))
table(data2$group) 

> LR   HR   Unknown
  200  40    0 

table(data2$parous, data2$group) 

>     LR   HR   Unknown 
 No   140    10     0
 Yes  60     30     0 
1

There are 1 best solutions below

0
Meryl Waurich On

I had a similar problem. I did the same as you and got a column of zero's in my contingency table. To solve this I identified the column with all zero counts using the R code - (I used your data2 reference in the code)

zero_column <- which(colSums(data2) == 0)

Then I created a new continency table that did not contain the zero column. I used the R code -

data2_filtered <- data2[, -zero_column]

The new contingency table should not have a column of zero's.