How to find value that corresponds to FDR=0.05 given distribution A and a null distribution?

300 Views Asked by At

I have two vectors of correlations: one which represents real correlations, and the other permuted correlations (the null distribution). I want to find the correlation value that corresponds to a FDR of 0.05.

Updated approach:

cor_real=rnorm(1000,0,sd=0.2)
cor_null=rnorm(1000,0,sd=0.15)

d_real=density(cor_real,from=max(min(cor_real),min(cor_null)),to=min(max(cor_real),max(cor_null)))
d_null=density(cor_null,from=max(min(cor_real),min(cor_null)),to=min(max(cor_real),max(cor_null)))
# here we ensure that the x values are comparable between the two densities

plot(d_real)
lines(d_null)

enter image description here

Then, to find the correlation value that corresponds to FDR = 0.05, my guess would be:

ratios=d_null$y/d_real$y
d_real$x[which(round(ratios,2)==.05)]
[1] 0.5694628 0.5716372 0.5868581 0.5890325 0.5912069
# this the the correlation value(s) that corresponds to a 5% chance of a false positive

Is this the right approach?


E.g.:

cor_real=rnorm(100,0.25,sd=0.1)
cor_null=rnorm(100,0.2,sd=0.1)

h_real=hist(cor_real,plot=F)
h_null=hist(cor_null,plot=F)

plot(h_null,col=rgb(1,0,0,.5),xlim=c(0,1),ylim=c(0,max(h_real$counts))) # in red
plot(h_real,col=rgb(0,.5,.5,0.25),add=T) # in blue

Real = blue, null = red

I think this is when the ratio of the frequencies of the two histograms = 0.05 (null:real), but I'm not 100% sure about that.

How might I find the correlation value that corresponds to FDR = 0.05, having "access" to both the null and real distributions?

1

There are 1 best solutions below

5
On BEST ANSWER

Density is not quite correct because 1. you did not set n and from, to to be the same, 2. it calculates the number of false positive / number of false negative at only 1 bin.

False discovery rate is defined as FP / (FP + TP). See this post too. We can calculate this once we put the two correlations in the same vector, label and order them:

set.seed(321)
cor_real=rnorm(1000,0,sd=0.2)
cor_null=rnorm(1000,0,sd=0.15)

df = data.frame(rho = c(cor_real,cor_null),
                type = rep(c(TRUE,FALSE),each=1000))
df$rho = abs(df$rho)
df = df[order(df$rho,decreasing=TRUE),]

df$FP = cumsum(df$type == FALSE)
df$TP = cumsum(df$type == TRUE)

df$FDR = df$FP / (df$FP + df$TP)

If you look at the results,

head(df,25)
           rho  type FP TP        FDR
366  0.5822139  TRUE  0  1 0.00000000
247  0.5632078  TRUE  0  2 0.00000000
298  0.5594879  TRUE  0  3 0.00000000
147  0.5460875  TRUE  0  4 0.00000000
781  0.5373146  TRUE  0  5 0.00000000
760  0.5367116  TRUE  0  6 0.00000000
797  0.5216281  TRUE  0  7 0.00000000
569  0.5204598  TRUE  0  8 0.00000000
374  0.5200687  TRUE  0  9 0.00000000
744  0.5101275  TRUE  0 10 0.00000000
864  0.5058457  TRUE  0 11 0.00000000
227  0.4997959  TRUE  0 12 0.00000000
66   0.4993164  TRUE  0 13 0.00000000
14   0.4886520  TRUE  0 14 0.00000000
830  0.4840573  TRUE  0 15 0.00000000
261  0.4765394  TRUE  0 16 0.00000000
1163 0.4703764 FALSE  1 16 0.05882353
27   0.4661862  TRUE  1 17 0.05555556
965  0.4633883  TRUE  1 18 0.05263158
530  0.4608271  TRUE  1 19 0.05000000
96   0.4575683  TRUE  1 20 0.04761905
851  0.4563224  TRUE  1 21 0.04545455
922  0.4516161  TRUE  1 22 0.04347826
343  0.4511517  TRUE  1 23 0.04166667

At abs(rho) >= 0.4511517, you have 1 FP and 23 TP, giving you an FDR of 0.0416.. which is below the FDR of 0.05. So you can set your absolute cutoff here.

The example you have is pretty hard to test because both are almost the same null hypothesis with only a different sd. In real life most likely we would need to simulate data to find the correlation we obtain under the null hypothesis. And you will see that the calculation above should work pretty well.