In R, I would like to conduct a paired sample t-test for two samples. I don't have the raw data. I only have three information: 1) means for both groups, 2) their variation in 95% confidence interval 3) Their N's. here is the data :
mean1 <- 0.5012997
mean2 <- 0.5115595
ci_upper1 <- 0.5141452
ci_lower1 <- 0.48845425
ci_upper2 <- 0.5205948
ci_lower2 <- 0.50252432
n1 <- 50
n2 <- 50
How can I do a paired sample t.test with this information?
I tried to answer your question. Note that this is more a statistics question than a programming question, so you will have better luck asking on https://stats.stackexchange.com.
The short answer: you need one more piece of information, the covariance or correlation between the two samples. Without it, you can not calculate a paired t-Test from only group means with confidence intervals and n. Here is the explanation:
First of all, the formula for the t statistic of two paired samples is this:
where D are the differences between the two measurements of each person. The upper part, mean of D (denoted with a bar above), is easy to get by. It is
mean1 - mean2. The hard part is to get SD, which is the standard deviation of the differences between measurements of each person.It is possible to reverse engineer the standard deviations of both sample 1 (let's call it
x) and sample 2 (y) out of the given confidence intervals. So we can get our hands onsd(x)andsd(y). (See below for how to do this)But we actually need SD, which is
sd(x - y). This answer on the stats forum tells us howsd(x - y)andsd(x)andsd(y)are related:Where
cov(x, y)is the covariance betweenxandy. If you happen to have this missing piece of information, you could calculate SD and t and do the test. If you don't have tho covariance or something like the correlation which contains the covariance, you can not calculate the paired t-test.FYI: How to reverse engineer the confidence intervals
First: I assume these are confidence intervals which are estimating population variance with sample variance, which is pretty standard, and therefore use the t-distribution. Then the formula looks like this:
ci_upper1 = mean1 + qt(.975, df = n1 - 1) * sd1 / sqrt(n1)You can extract the unknown
s1(standard deviation of sample 1) with:sd1 = (ci_upper1 - mean1) / qt(.975, df = n1 - 1) * sqrt(n1)