I have data with seven variables and I want to calculate pairwise correlation (also the significance level of each correlation). The data I have is the effect size of a treatment on these seven variables. -ve value shows inhibiting effects and +ve value promoting effects. The higher or lower the value is the higher or lower the inhibiting or promoting effect is on certain variables. Data also contain a large number of missing values, so in pairwise correlation, I want R to ignore the correlation if one or both of the variables missing the value.
Here is a sample dataset
set.seed(123)
# Create the dataset with effect sizes and missing values
mydata <- data.frame(
Var1 = sample(c(-20:14, NA), 200, replace = TRUE),
Var2 = sample(c(-20:14, NA), 200, replace = TRUE),
Var3 = sample(c(-20:14, NA), 200, replace = TRUE),
Var4 = sample(c(-20:14, NA), 200, replace = TRUE),
Var5 = sample(c(-20:14, NA), 200, replace = TRUE),
Var6 = sample(c(-20:14, NA), 200, replace = TRUE),
Var7 = sample(c(-20:14, NA), 200, replace = TRUE)
)
# Set more than 50% missing values in each column
for (col in 1:7) {
missing_indices <- sample(1:200, size = 101)
mydata[missing_indices, col] <- NA
}
My question is "Is it possible to calculate pairwise correlation along with the significance level using effect size values in this case?
You can use
cor.test()
which only uses the relevant data for the pair of observations. Unlikecor()
cor.test()
works only for onex
and oney
at a time. In the code below, I useouter()
to run through all pairs of values. I do this for both the correlation (stored inr
) and its p-value (stored inp
). First, we make the data according to your specification.Next, we can make the correlations and their p-values.
Created on 2023-06-28 with reprex v2.0.2
The question you asked is not entirely one of mechanics. As you can see, the operations for calculating pairwise correlations with p-values is not all that difficult. The other part of your question is - does it make sense to calculate these values on variables that contain the effect size? I suppose if the observations are the same (i.e., the rows represent the same observation for each effect size) then it could make sense to do this. If you're looking for more advice about that part of your question, you would likely be better off posting it on Cross Validated.
Edit: Errors in correlations
In the comments the OP suggested that there were some pairs that were producing errors. The problem with the actual use-case data (not the example data) is that there are two pairs of variables where there are only two observations. This triggers an error in
cor.test()
. To solve this, you could put atry()
statement to catch the errors and then return a missing value if there was an error. The code will generate errors, but those errors are absorbed bytry()
so the computation will continue. The resultingr
andp
matrices will have some missing values where there were pairs of variables with too few values to calculate the correlation.Created on 2023-06-28 with reprex v2.0.2