I am not looking for particular code, more of general theory of what could be done for the problem.
We have a stream of incoming data (data entries are quite large, thus it is infeasible to store more than a few M of them in memory). The goal is to find the average of the last K entries we have seen in the stream, with K >> M. We have some statistical information about the data (which we can realistically compute online as data comes) such as mean and variance. Incoming data is a result of several unknown stochastic processes.
How could we approximate the last K elements-mean in such a setting?
Of course if M were close enough to K or greater than K, the problem can be trivially solved, by simply storing past M values and approximating the mean based on them. But if you assume a great difference between the two values (M = 4, K = 400), this approximation quickly degrades in quality. Compressing the data points to reduce the memory they take is not a sufficient enough increase (M' <= 3M, where M' is the the number of compressed entries we can store). A linear/quadratic/cubic approximation of the past K entries (based on just the last value we have seen), results in error approximately as bad as just using the past M entries.