I have two vectors of correlations: one which represents real correlations, and the other permuted correlations (the null distribution). I want to find the correlation value that corresponds to a FDR of 0.05.
Updated approach:
cor_real=rnorm(1000,0,sd=0.2)
cor_null=rnorm(1000,0,sd=0.15)
d_real=density(cor_real,from=max(min(cor_real),min(cor_null)),to=min(max(cor_real),max(cor_null)))
d_null=density(cor_null,from=max(min(cor_real),min(cor_null)),to=min(max(cor_real),max(cor_null)))
# here we ensure that the x values are comparable between the two densities
plot(d_real)
lines(d_null)
Then, to find the correlation value that corresponds to FDR = 0.05, my guess would be:
ratios=d_null$y/d_real$y
d_real$x[which(round(ratios,2)==.05)]
[1] 0.5694628 0.5716372 0.5868581 0.5890325 0.5912069
# this the the correlation value(s) that corresponds to a 5% chance of a false positive
Is this the right approach?
E.g.:
cor_real=rnorm(100,0.25,sd=0.1)
cor_null=rnorm(100,0.2,sd=0.1)
h_real=hist(cor_real,plot=F)
h_null=hist(cor_null,plot=F)
plot(h_null,col=rgb(1,0,0,.5),xlim=c(0,1),ylim=c(0,max(h_real$counts))) # in red
plot(h_real,col=rgb(0,.5,.5,0.25),add=T) # in blue
I think this is when the ratio of the frequencies of the two histograms = 0.05 (null:real), but I'm not 100% sure about that.
How might I find the correlation value that corresponds to FDR = 0.05, having "access" to both the null and real distributions?
Density is not quite correct because 1. you did not set
n
andfrom, to
to be the same, 2. it calculates the number of false positive / number of false negative at only 1 bin.False discovery rate is defined as FP / (FP + TP). See this post too. We can calculate this once we put the two correlations in the same vector, label and order them:
If you look at the results,
At abs(rho) >= 0.4511517, you have 1 FP and 23 TP, giving you an FDR of 0.0416.. which is below the FDR of 0.05. So you can set your absolute cutoff here.
The example you have is pretty hard to test because both are almost the same null hypothesis with only a different sd. In real life most likely we would need to simulate data to find the correlation we obtain under the null hypothesis. And you will see that the calculation above should work pretty well.