Conducting paired t-test without the raw data

72 Views Asked by At

In R, I would like to conduct a paired sample t-test for two samples. I don't have the raw data. I only have three information: 1) means for both groups, 2) their variation in 95% confidence interval 3) Their N's. here is the data :

mean1 <- 0.5012997


  mean2 <- 0.5115595
  
  
  ci_upper1 <- 0.5141452
  ci_lower1 <- 0.48845425
  ci_upper2 <- 0.5205948
  ci_lower2 <- 0.50252432
  n1 <- 50
  n2 <- 50

How can I do a paired sample t.test with this information?

1

There are 1 best solutions below

0
uke On

I tried to answer your question. Note that this is more a statistics question than a programming question, so you will have better luck asking on https://stats.stackexchange.com.

The short answer: you need one more piece of information, the covariance or correlation between the two samples. Without it, you can not calculate a paired t-Test from only group means with confidence intervals and n. Here is the explanation:

First of all, the formula for the t statistic of two paired samples is this:

formula

where D are the differences between the two measurements of each person. The upper part, mean of D (denoted with a bar above), is easy to get by. It is mean1 - mean2. The hard part is to get SD, which is the standard deviation of the differences between measurements of each person.

It is possible to reverse engineer the standard deviations of both sample 1 (let's call it x) and sample 2 (y) out of the given confidence intervals. So we can get our hands on sd(x) and sd(y). (See below for how to do this)

But we actually need SD, which is sd(x - y). This answer on the stats forum tells us how sd(x - y) and sd(x) and sd(y) are related:

formula2

Where cov(x, y) is the covariance between x and y. If you happen to have this missing piece of information, you could calculate SD and t and do the test. If you don't have tho covariance or something like the correlation which contains the covariance, you can not calculate the paired t-test.


FYI: How to reverse engineer the confidence intervals

First: I assume these are confidence intervals which are estimating population variance with sample variance, which is pretty standard, and therefore use the t-distribution. Then the formula looks like this:

ci_upper1 = mean1 + qt(.975, df = n1 - 1) * sd1 / sqrt(n1)

You can extract the unknown s1 (standard deviation of sample 1) with:

sd1 = (ci_upper1 - mean1) / qt(.975, df = n1 - 1) * sqrt(n1)