frequency table for repeated measure

70 Views Asked by At

original df:

ID <- c(1,1,1,1,2,2,2,2,3,3,3,3,3)
DX <- c("A","A","B","B","C","C","A","B","A","A","A","B","B")
df <- data.frame(ID,DX)

   ID DX
1   1  A
2   1  A
3   1  B
4   1  B
5   2  C
6   2  C
7   2  A
8   2  B
9   3  A
10  3  A
11  3  A
12  3  B
13  3  B

I try to make a frequency table for DX.

tblFun <- function(x){
  tbl <- table(x)
  res <- cbind(tbl,round(prop.table(tbl)*100,2))
  colnames(res) <- c('Count','Percentage')
  res
}

do.call(rbind,lapply(df[2],tblFun))

  Count Percentage
A     6      46.15
B     5      38.46
C     2      15.38

The calculation above has the denominator 13 (which is the number of observations), but since there are only 3 distinct IDs, the denominator should be 3. i.e: 3 people had A, 3 people had B, 1 person had C, so the calculations should be like the following:

  Count Percentage
A     3      100.00
B     3      100.00
C     1      33.33

How can I transform the data frame so the calculation could be done like the above?

I would appreciate all the help there is! Thanks!

3

There are 3 best solutions below

5
akrun On BEST ANSWER

After creating the table object, get the rowSums on rowMeans on a logical matrix

m1 <- table(df[2:1]) > 0
cbind(Count = rowSums(m1), Percentage = round(rowMeans(m1)* 100, 2))

-output

  Count Percentage
A     3     100.00
B     3     100.00
C     1      33.33
2
DrEspresso On

Using the dplyr package and the pipe operator %>%:

library(dplyr)

# Distinct number of IDs
nID <- n_distinct(df$ID)

df %>%
  # Remove duplicates
  distinct() %>%
  # Count number of IDs, summarise by groups in DX
  summarise(Count = n(), .by = DX) %>%
  # Calculate percentage
  mutate(Percentage = round(Count/nID*100))

P.S.: To order the output according to the "Count" column in descending order, you can add (you need to add the %>% after the last line of the previous code)

      ...   %>%
  # Sort by frequency
  arrange(desc(Count))
0
TarJae On

Something like this:

library(dplyr) # >= 1.1.0
df %>% 
  summarize(Count = n_distinct(ID), .by=DX) %>% 
  mutate(Percentage = round(Count/max(Count)*100, 2))

 DX Count Percentage
1  A     3     100.00
2  B     3     100.00
3  C     1      33.33