Summarizing count data as proportion in a data.frame

202 Views Asked by At
dummy <- data.frame(Q1 = c(0, 1, 0, 1),
                    Q2 = c(1, 1, 0, 1),
                    Q3 = c(0, 1, 1, 0))
df_dummy <- data.frame(Question = c("Q1", "Q2", "Q3"),
                       X1 = c(2/4, 3/4, 2/4),
                       X0 = c(2/4, 1/4, 2/4))

> dummy
  Q1 Q2 Q3
1  0  1  0
2  1  1  1
3  0  0  1
4  1  1  0

> df_dummy
  Question   X1   X0
1       Q1 0.50 0.50
2       Q2 0.75 0.25
3       Q3 0.50 0.50

I have some data (dummy) where I have binary responses to Q1, Q2, and Q3. I want to summarize my data in the format as shown in df_dummy, where for each question, column X1 tells me the proportion of people that answered 1 to Q1, and column X0 tells me the proportion of people that answered 0 to Q0. I tried prop.table but that didn't return the desired result.

6

There are 6 best solutions below

1
On BEST ANSWER

Another way is counting the proportion of 1s and then deducing from that the proportion of 0s:

X1 <- colSums(dummy==1)/nrow(dummy)
df_dummy <- data.frame(X1, X0=1-X1)
df_dummy
#     X1   X0
#Q1 0.50 0.50
#Q2 0.75 0.25
#Q3 0.50 0.50

NB, inspired from @akrun's idea of ColMeans: You can also use colMeans instead of dividing colSumsby the number of row to define X1:

X1 <- colMeans(dummy==1)
df_dummy <- data.frame(X1, X0=1-X1)
df_dummy
#     X1   X0
#Q1 0.50 0.50
#Q2 0.75 0.25
#Q3 0.50 0.50
1
On

Another way to do this would be using do.call & lapply

do.call(cbind,lapply(dummy,function(x) data.frame(table(x))[,2]))
#    Q1 Q2 Q3
[1,]  2  1  2
[2,]  2  3  2
2
On

We can try apply with margin =2 and divide the counts of each value with the total length in the column

t(apply(dummy, 2, function(x) table(x)/length(x)))

#     0    1
#Q1 0.50 0.50
#Q2 0.25 0.75
#Q3 0.50 0.50
0
On

A tidyverse option:

library(tidyr)
library(janitor)

dummy %>%
  gather(question, val) %>%    # reshape to long form
  tabyl(question, val) %>%    # make crosstab table
  adorn_percentages("row") %>%
  clean_names() 



 question   x0   x1
       Q1 0.50 0.50
       Q2 0.25 0.75
       Q3 0.50 0.50
2
On

We can do this with table and prop.table

t(sapply(dummy, function(x) prop.table(table(x))))
#     0    1
#Q1 0.50 0.50
#Q2 0.25 0.75
#Q3 0.50 0.50

Or a more efficient approach is to call table once

prop.table(table(stack(dummy)[2:1]),1)
#   values
#ind     0    1
#  Q1 0.50 0.50
#  Q2 0.25 0.75
#  Q3 0.50 0.50

Or another option is colMeans (inspired from @Cath's use of colSums)

X0 <- colMeans(!dummy)
data.frame(X1 = 1 - X0, X0)
#    X1   X0
#Q1 0.50 0.50
#Q2 0.75 0.25
#Q3 0.50 0.50
0
On

Less elegantly than in the answer above:

d <- t(dummy)
cbind(X0 = (ncol(d) - rowSums(d)) / ncol(d), X1 = rowSums(d) / ncol(d))

Or, to avoid computing the same stuff twice, and to get a data frame:

d <- t(dummy)
i <- ncol(d)
j <- rowSums(d)
data.frame(Question = rownames(d), X0 = (i - j) / i, X1 = j / i)

There you go:

   Question   X0   X1
Q1       Q1 0.50 0.50
Q2       Q2 0.25 0.75
Q3       Q3 0.50 0.50