I am wondering if there is an algorithm that calculates the mean value and standard deviation of an unbound data set.
for example, I am monitoring an measurement value, say, electric current. I would like to have the mean value of all historical data. Whenever a new value come, update the mean and stdev? Because the data is too big to store, I hope it can just update the mean and stdev on the fly without storing the data.
Even data is stored, the standard way (d1+...+dn)/n, doesn't work, the sum will blow out the data representation.
I through about sum(d1/n + d2/n + ... d3/n), if n is hugh, the error is too big and accumulated. Besides, n is unbound in this case.
The number of data is definitely unbound, whenever it comes, it requires to update the value.
Does anybody know if there is an algorithm for it?
So, I'm not sure if this is an algorithm that has been used before, but I'll provide it anyway. I started with the idea of calculating the standard deviation with the wrong mean and then correcting based on the real mean. Here is a picture of something I wrote about it