Power law distribution fitting in Python

4.4k Views Asked by At

I am using different python to fit density functions on a dataset. This data set is made of positive time values starting from 1 second.

I tested different density functions from scipy.statistics and the powerlaw library, as well as my own functions using scipy.optimize's function curve_fit().

So far, I obtained the best results when fitting the following "modified" power law function :

def funct(x, alpha, x0):
    return((x+x0)**(-alpha))

My code is as follow :

bins = range(1,int(s_distrib.max())+2,1)
y_data, x_data = np.histogram(s_distrib, bins=bins, density=True)
x_data = x_data[:-1]

param_bounds=([0,-np.inf],[np.inf,np.inf])
fit = opt.curve_fit(funct,
                    x_data,
                    y_data,
                    bounds=param_bounds) # you can pass guess for the parameters/errors
alpha,x0 = fit[0]
print(fit[0])

C = 1/integrate.quad(lambda t: funct(t,alpha,x0),1,np.inf)[0]

# Calculate fitted PDF and error with fit in distribution
pdf = [C*funct(x,alpha,x0) for x in x_data]
sse = np.sum(np.power(y_data - pdf, 2.0))
print(sse)

fig, ax = plt.subplots(figsize=(6,4))
ax.loglog(x_data, y_data, basex=10, basey=10,linestyle='None',  marker='.')
ax.loglog(x_data, pdf, basex=10, basey=10,linestyle='None',  marker='.')

The fitting returns a value of 8.48 for x0, and of 1.40 for alpha. In the loglog plot, the data and fit plot look like this :

plot

  • My first question is technical. Why do I get the following warning and error in opt.curve_fit when changing (x+x0) to (x-x0) in the funct function ? Since my bounds for x0 are (-inf, +inf), I was expecting the fitting to return -8.48.

/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: RuntimeWarning: divide by zero encountered in reciprocal This is separate from the ipykernel package so we can avoid doing imports until ValueError: Residuals are not finite in the initial point.

  • My other questions are theoritical. Is (x+x0)^(-alpha) a standard distribution? What does the x0 value represents, how to interpret this 8.48s value physically? From what I understand this means that my distribution corresponds to a shifted power law distribution? Can I consider that x0 corresponds to the xmin value classically needed when fitting data to power laws ?
  • Concerning this xmin value, I understand that it can make sense to consider only the data greater than this threshold for the fitting process to characterise the tail of the distribution. However, I am wondering what is the standard way to characterise the full data with a distribution that would be a power law after xmin and something else before xmin.

This is a lot of questions as I am very unfamiliar with the subject, any comment and answer, even partial, will be very appreciated!

1

There are 1 best solutions below

0
On

Is (x+x0)^(-alpha) a standard distribution?

To answer your second question, yes, it is standard distribution, called Zipf distribution. It is implemented in Python/NumPy as well.

What does the x0 value represents

this is shift parameter. Any distribution on top of standard parameters (like power parameter in Zipf) might have shift and scale parameters, which basically says your X values are measured in different units with different origin point.

Concerning this xmin value, I understand that it can make sense to consider only the data greater than this threshold for the fitting process to characterise the tail of the distribution.

This is how Zipf law is defined, from 0 to Infinity. Shifting it means your origin would be different