T-test comparing multiple columns to other columns

169 Views Asked by At

enter image description here

I am relatively new to R and need some help with my data analysis. In the attached table, Master Protein Accession column consists of a list of proteins that are increased or decreased in the cortex(C) under three conditions, i.e., control (C), dehydration(D) and rehydration(R). Each condition has 5 samples; CC(1,2,3,4 and 5), CD(1,2,3,4 and 5) and CR(1,2,3,4 and 5). I need to do a t-test for comparing Cortex Control (CC1,2,3,4 and 5) samples against Cortex Dehydration (CD1,2,3,4 and 5) samples respectively for all the proteins. Such that when I run the code, row 1 CC1 value gets t-tested against row 1 CD 1 value, row 2 CC1 value gets t-tested against row 2 CD 1 value and so on.

I tried

apply(allcor1, function(x){t.test(x[2:12],x[4:14], nchar)})

but it gives me

Error in match.fun(FUN) : argument "FUN" is missing, with no default

1

There are 1 best solutions below

3
On

The challenge you have is that the data is too "wide": you are representing each protein as one row when it is at least 5 data points.

The problem gets easier if you reshape it. Here I'll use tidyr's pivot functions, as well as extract.

library(dplyr)
library(tidyr)

# Removing the "sd" columns,
# and renaming first column to "protein" to be easier to work with
longer_data <- yourdata %>%
  select(-starts_with("sd")) %>%
  rename(protein = 1) %>%
  # pivot all columns besides protein into one column condition_sample
  pivot_longer(cols = c(-protein),
               names_to = "condition_sample") %>%
  # Split your CC1, CD2, etc into two columns after the second letter
  separate(condition_sample, c("condition", "sample"), 2) %>%
  # Make them wide again by condition
  pivot_wider(names_from = condition, values_from = value)

I can't test without a reproducible example, but this should give you a table with columns protein, condition, sample (1-5) and value).

At this point, the data is more flexible to be used for statistical modeling, such as a paired t-test. I use dplyr here to do grouped t-tests of CC against CD, and the broom package to tidy it.

library(broom)

longer_data %>%
  group_by(protein) %>%
  summarize(tidied_model = list(tidy(t.test(CC, CD, paired = TRUE)))) %>%
  unnest(tidied_model)

This would give you columns estimate, statistic, and p.value, among others (confidence intervals, etc) for each protein.