I want to find the standard deviation of a normalized frequency.
I have frequency distributions on a scale from 1 to 9 normalized to add up to 1. The values are stored in and also in different pandas columns as floats
df[names].iloc[0]
pred_percet_rating_1 0.009985
pred_percet_rating_2 0.023371
pred_percet_rating_3 0.045363
pred_percet_rating_4 0.090492
pred_percet_rating_5 0.134723
pred_percet_rating_6 0.188476
pred_percet_rating_7 0.202444
pred_percet_rating_8 0.204562
pred_percet_rating_9 0.100585
This first row represents one product that has been rated by people. It has most often been rated a 7 (20 percent of the ratings) or a 8 (also 20 percent of the ratings).
Now I want to calculate for each row a standard deviation but all my approaches fail on the fact, that I have to translate the distance between the columns somehow. I already tried to make a np.histogramm to use the returns to calculate a standard deviation, but to no avail.
Any pointers are more than welcome!
You have to calculate the mean
sum(x[i])/n,sum((x[i] - xm)**2)/nIf you group the repeated ratings, you will find out that the coefficient of each unique value is the frequency in your table.So the mean is
mu = np.sum(x * f), and the standard deviation isnp.sqrt(np.sum(f * (x - mu)**2))For your example data it would be computed like this