Python: two normal distribution

2.5k Views Asked by At

I have two data sets where two values where measured. I am interested in the difference between the value and the standard deviation of the difference. I made a histogram which I would like to fit two normal distributions. To calculate the difference between the maxima. I also would like to evaluate the effect that in on data set I have much less data on one value. I've already looked at this link but it is not really what I need: Python: finding the intersection point of two gaussian curves

enter image description here

enter image description here

for ii in range(2,8):
   # Kanal = ii - 1
    file = filepath + '\Mappe1.txt'
    data = np.loadtxt(file, delimiter='\t', skiprows=1)
    data = data[:,ii]
    plt.hist(data,bins=100)
    plt.xlabel("bins")
    plt.ylabel("Counts")
    plt.tight_layout()
    plt.grid()
    plt.figure()

plt.show()
2

There are 2 best solutions below

0
On

Quick and dirty fitting can be readily achieved using scipy:

from scipy.optimize import curve_fit #non linear curve fitting tool
from matplotlib import pyplot as plt

def func2fit(x1,x2,m_1,m_2,std_1,std_2,height1, height2): #define a simple gauss curve
    return height1*exp(-(x1-m_1)**2/2/std_1**2)+height2*exp(-(x2-m_2)**2/2/std_2**2)

init_guess=(-.3,.3,.5,.5,3000,3000) 
#contains the initial guesses for the parameters (m_1, m_2, std_1, std_2, height1, height2) using your first figure

#do the fitting
fit_pars, pcov =curve_fit(func2fit,xdata,ydata,init_guess) 
#fit_pars contains the mean, the heights and the SD values, pcov contains the estimated covariance of these parameters 

plt.plot(xdata,func2fit(xdata,*fit_pars),label='fit') #plot the fit

For further reference consult the scipy manual page: curve_fit

2
On

Assuming that the two samples are independent there is no need to handle this problem using curve fitting. It's basic statistics. Here's some code that does the calculations required, with the source attributed in a comment.

## adapted from http://onlinestatbook.com/2/estimation/difference_means.html

from random import gauss
from numpy import sqrt

sample_1 = [ gauss(0,1) for _ in range(10) ]
sample_2 = [ gauss(1,.5) for _ in range(20) ]

n_1 = len(sample_1)
n_2 = len(sample_2)

mean_1 = sum(sample_1)/n_1
mean_2 = sum(sample_2)/n_2

SSE = sum([(_-mean_1)**2 for _ in sample_1]) + sum([(_-mean_2)**2 for _ in sample_2])
df = (n_1-1) + (n_2-1)
MSE = SSE/df

n_h = 2 / ( 1/n_1 + 1/n_2 )
s_mean_diff = sqrt( 2* MSE / n_h )

print ( 'difference between means', abs(n_1-n_2))
print ( 'std dev of this difference', s_mean_diff )