I have a NumPy array with a large number of data points = 53046323. The data represent durations and follow discrete distributions, after a search I believe it can fit Boltzmann. I did several trials to estimate the best parameters of the distribution to fit the data and the best was with lambda=1
sa1=np.load('all_4_daily_consec_count_baseline_list.npy',allow_pickle=True)
nn=sa1.tolist()
data=np.concatenate(nn)
plt.hist(data, bins=int(np.max(data)), density=True, alpha=0.5)
plt.plot(data, boltzmann.pmf(data,1,53046322), 'go', markersize=9)
But is not suiting all values as in figure (test_boltzmann_full) test_boltzmann_full
So I tried to fit part of the data as in figure (test2), which has the same distribution shape and number of data points 585
sa1=np.load('all_4_daily_consec_count_baseline_list.npy',allow_pickle=True)
xx=np.reshape(sa1,(607,484))
noov_h=xx[306,250]
noov_hh=noov_h.astype('float')
data=noov_hh[~np.isnan(noov_hh)]
plt.hist(data, bins=int(np.max(data)), density=True, alpha=0.5)
plt.plot(data, boltzmann.pmf(data,1,584), 'go', markersize=9)
May I know how to get the best-fitting parameters to fit my data? in both cases? is there an adaptive way to get better fitting?
The data is in the link as its large to be uploaded here data
scipy.stats.fit
can be used to fit distribution parameters to data. Here's an example of fitting parameters of the Boltzmann distribution to data sampled from a Boltzmann distribution.