Getting Conditional Subset of Contingency Table

350 Views Asked by At

I have some data that I'm summarising as contingency tables. There are several entries in the data which are either missing or error values. Constructing the tables using table, as per the code below, is very useful as I can see by inspection how much of the data is missing or nonsense.

Knowing in advance which data items I want to retain, how can I select a subset of the data? For example, a small table with a portion of the data is:

my.tab <- table(sm.pos.grp, sm.neg.grp)

      sm.neg.grp
sm.pos.grp  zz  Zz  ZZ
        00   0   9   1
        zz   0   0  31
        Zz  11   5   7
        ZZ   0  77 211

I'm only interested in the zz, ZZ, and Zz entries, so I can extract the relevant subset of the table like this:

my.tab[, 2:4]

      sm.neg.grp
sm.pos.grp zz Zz ZZ
        zz  0  1  0
        Zz  0 10  7
        ZZ  3  7 21

However, the the full data set is more complex:

        full.pos.grp
full.neg.grp   00   zz   zZ   Zz   ZZ ZTRUE TRUEz TRUEZ TRUEFalse
   00           0    0    0    0    4     0     0     0         0
   zz           5  126  140  151  258    15     0     0         0
   zZ           3  123  547    0  616     0     0     0         0
   Zz           2  120    0  513  572     0     0     2         0
   ZZ          19  277  642  293 2286     0     5    28         0
   TRUEz        0    0    0    1    3     0     0     0         0
   TRUEZ        0    9    0    2   18     0     1    16         1
   TRUEFalse    0    0    0    0    0     1     0     1         0

How can I subset the table by reference only to zz, Zz, zZ and ZZ? Converting to a data frame using as.data.frame(my.tab) loses the table structure, and I can't seem to get the syntax right for tapply (e.g. I tried things like tapply(sm.neg.grp, sm.pos.grp, sum) without success). Any help much appreciated!

Here's the dput commands for the tables:

> dput(my.tab)
structure(c(0L, 0L, 11L, 0L, 9L, 0L, 5L, 77L, 1L, 31L, 7L, 211L), .Dim = c(4L, 
3L), .Dimnames = structure(list(sm.pos.grp = c("00", "zz", "Zz", 
"ZZ"), sm.neg.grp = c("zz", "Zz", "ZZ")), .Names = c("sm.pos.grp", 
"sm.neg.grp")), class = "table")  

> dput(the.table)
structure(c(0L, 5L, 3L, 2L, 19L, 0L, 0L, 0L, 0L, 126L, 123L, 
120L, 277L, 0L, 9L, 0L, 0L, 140L, 547L, 0L, 642L, 0L, 0L, 0L, 
0L, 151L, 0L, 513L, 293L, 1L, 2L, 0L, 4L, 258L, 616L, 572L, 2286L, 
3L, 18L, 0L, 0L, 15L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
5L, 0L, 1L, 0L, 0L, 0L, 0L, 2L, 28L, 0L, 16L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 1L, 0L), .Dim = 8:9, .Dimnames = structure(list(full.case.grp = c("00", 
"zz", "zZ", "Zz", "ZZ", "TRUEz", "TRUEZ", "TRUEFalse"), full.ctrl.grp = c("00", 
"zz", "zZ", "Zz", "ZZ", "ZTRUE", "TRUEz", "TRUEZ", "TRUEFalse")), 
.Names = c("full.neg.grp", "full.pos.grp")), class = "table")
1

There are 1 best solutions below

0
On BEST ANSWER

To subset your table by reference (i.e. by column and rownames) you can enter the names directly inside the squared brackets .

n <- c("zz", "Zz", "zZ", "ZZ")
my.tab[n, n]

            full.pos.grp
full.neg.grp  zz  Zz  zZ   ZZ
          zz 126 151 140  258
          Zz 120 513   0  572
          zZ 123   0 547  616
          ZZ 277 293 642 2286