Compute p-values across all columns of (possibly large) matrices in R

662 Views Asked by At

is there are any more efficient/faster way to compare two matrices (column by columns) and to compute p-values using t-test for no difference in means (eventually switching to the chisq.test when necessary)?

Here is my solution:

## generate fake data (e.g., from treatment and control data)
z0 <- matrix(rnorm(100),10,10)
z1 <- matrix(rnorm(100, mean=1.1, sd=2),10,10)

## function to compare columns (bloody for loop)
compare.matrix <- function(z0, z1){
  pval <- numeric(ncol(z0)) ## initialize

  for(i in 1:ncol(z0)){ ## compare columns
    pval[i] <- t.test(z1[, i], z0[, i])$p.value

    ## if var is categorical, switch test type
    if ( length(unique(z1[,i]))==2){
      index <- c(rep(0, nrow(z0)), rep(1, nrow(z1)))
      xx <- c(z0[,i], z1[,i])
      pval[i] <- chisq.test(table(xx, index), simulate.p.value=TRUE)$p.value      
    }
  }
  return(pval)  
}
compare.matrix(z0, z1)
1

There are 1 best solutions below

0
On BEST ANSWER

Here's one way using dplyr. It would probably be better to combine the first three lines into a single step if you've got large matrices, but I separated them for clarity. I think the chi-squared case would be a fairly simple extension.

z0_melt = melt(z0, value.name='z0')[,c('Var2','z0')]
z1_melt = melt(z1, value.name='z1')[,c('Var2','z1')]
all_df = merge(z0_melt, z1_melt)

library(dplyr)

all_df %>%
  group_by(Var2) %>%
  summarize(p = t.test(z0, z1)$p.value)