Decorrelating a large number of data using numpy

325 Views Asked by At

I'm trying to decorrelate a large number of correlated random variables. Specifically, I perform the following to generate a 1200 x 1000 set data, where 1200 is number of variables and 1000 is number of data points.

seed = 0
sample_size = 1000
n_var = 1200
total_rng = np.random.RandomState(seed=seed).randn(sample_size*n_var).reshape((n_var, sample_size))

The correlation for the generated numbers can be surprisingly large depending on the seed used (I've seen close to 0.1~0.2 time to time), and I need the variables to have close to 0 correlation (preferably within 1e-4, but anything less than 1e-2 could work as well).

I tried a lot of different methods such as Cholesky and ZCA (How to implement ZCA Whitening? Python), but the former failed from positive definite condition while the latter simply could not bring the correlation close to 0 based on the criteria above.

Is there any way to reduce the correlation significantly? Or would this be the best I could do for such a large number of variables?

Thank you!

0

There are 0 best solutions below