I want to translate the following code from R to Python using scipy.stats.probplot.
qqplot(-log10(ppoints(1000)), -log10(p_value))
This is Q-Q plot of p-values compared to uniform distribution with a minus log scale. I am after something like the following. (I know that there are other libraries that achieve this, but I am looking for the answer for probplot.)
probplot(-np.log10(p_values_data), dist="uniform", sparams=(0, 1), plot=plt)
This does not work correctly because, the x-axis is uniform. Here, plt is due to import matplotlib.pyplot as plt. I found the post here, among others, but I did not find anything on modifying the dist parameter to accommodate -log10(uniform).
How can I get this plot using probplot?
Here is a revision of the problem description.
Here is the data generation.
import numpy as np
from scipy.stats import chi2,probplot
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
def compute_p_with_chi2(x,y):
model = ols('y ~ x', data=dict(y=y, x=x)).fit()
t_stat = model.tvalues['x']
p_value = 1-chi2.cdf(t_stat**2, 1)
return p_value
def compute_pvalues(X_data,p_data):
p_values = []
for col in X_data.T:
p_value = compute_p_with_chi2(col,p_data)
p_values.append(p_value)
return p_values
n = 100
p = 1000
X = np.random.binomial(2, 0.4, size=(n, p))
y = np.random.normal(size=n)
p_values = compute_pvalues(X,y)
Doing a histogram of the p-values, I get a uniform distribution as expected.
plt.hist(p_values)
However, plotting the Q-Q using the probplot, I do not get two overlapping diagonals. Here is what I get.
probplot(-np.log10(p_values), dist="uniform", sparams=(0, 1), plot=plt)
I am including the desired output from R with the (first) code above.
My feeling is that this is something very simple, but I am somehow missing it.





you can manually transform the data in order to compare them to a uniform distribution:
and then the command you've already provided