Data reduction/transformation

16 Views Asked by OverFlow Police At 09 January 2019 at 01:26

Has anyone seen any method to reduce the data for reducing the computation amount? What I mean by that is when number of features are huge, one may apply PCA to reduce the dimension and computation. What if we have a handful of features but huge number of data points (time series). How can one reduce that?

Original Q&A

There are 1 best solutions below

Has QUIT--Anony-Mousse On 09 January 2019 at 08:20

Subsampling is fairly common.

Many statistical properties are well preserved when you subsample. If you have 1000000 points, the mean estimated from just 10000 is already very close; and maybe well within the reliability of your data.

Another approach is clustering with a simple and fast method such as k-means - and a large k, say sqrt(N). This will approximate your data with a least-squares objective using k data points. (You should also use the weights afterwards, as the resulting vectors will reflect different amounts of data).

Last but not least, many reduction techniques - probably including PCA - can be used on the transposed matrix. Then you reduce the number of instances, not the number of variables. But PCA is fairly expensive and on the transposed matrix, it would scale O(n³). So I would rather consider directly working with a truncated SVD.

But apparently your data are time series. I would suggest to look for data reduction that integrates your knowledge about what is important here.

Data reduction/transformation

There are 1 best solutions below

Related Questions in CLUSTER-ANALYSIS

Related Questions in HIERARCHICAL-CLUSTERING

Related Questions in SIZE-REDUCTION

Trending Questions

Popular # Hahtags

Popular Questions