Python - How to calculate standard deviation out of normalized frequencys

357 Views Asked by At

I want to find the standard deviation of a normalized frequency.

I have frequency distributions on a scale from 1 to 9 normalized to add up to 1. The values are stored in and also in different pandas columns as floats

df[names].iloc[0]

pred_percet_rating_1    0.009985
pred_percet_rating_2    0.023371
pred_percet_rating_3    0.045363
pred_percet_rating_4    0.090492
pred_percet_rating_5    0.134723
pred_percet_rating_6    0.188476
pred_percet_rating_7    0.202444
pred_percet_rating_8    0.204562
pred_percet_rating_9    0.100585

This first row represents one product that has been rated by people. It has most often been rated a 7 (20 percent of the ratings) or a 8 (also 20 percent of the ratings).

Now I want to calculate for each row a standard deviation but all my approaches fail on the fact, that I have to translate the distance between the columns somehow. I already tried to make a np.histogramm to use the returns to calculate a standard deviation, but to no avail.

Any pointers are more than welcome!

1

There are 1 best solutions below

0
Bob On

You have to calculate the mean sum(x[i])/n, sum((x[i] - xm)**2)/n If you group the repeated ratings, you will find out that the coefficient of each unique value is the frequency in your table.

So the mean is mu = np.sum(x * f), and the standard deviation is np.sqrt(np.sum(f * (x - mu)**2))

For your example data it would be computed like this

f = [0.009985,0.023371,0.045363,0.090492,
 0.134723,0.188476,0.202444,0.204562,0.100585]
x = np.arange(1, 10)
mu = np.sum(x * f) # average 6.318124
sigma = sum(f*(x - sum(x*f))**2) # standard deviation 3.35