Has anyone seen any method to reduce the data for reducing the computation amount? What I mean by that is when number of features are huge, one may apply PCA to reduce the dimension and computation. What if we have a handful of features but huge number of data points (time series). How can one reduce that?
Data reduction/transformation
16 Views Asked by OverFlow Police At
1
There are 1 best solutions below
Related Questions in CLUSTER-ANALYSIS
- Cluster Analysis after a process
- Threshold scaling along a straight line
- create a bubble plot (or something similar) from cluster analysis in R
- Project idea about clustering and sentences similarity
- Mahalanobis distance computation in Python
- Adding a Bubble Plot as a Complex Heatmap Annotation
- Clustering Medium length (100bp) DNA Sequences
- Indicating the same clusters by colour between two Igraph plots using k mean clustering
- how to specify the maximum number of clusters for the STC algorithm in Solr admin console?
- Text clustering based on “stance” rather than the distribution of embeddings as the basis for clustering
- R ComplexHeatmap cannot reproduce exact row orders when apply row clusters to new matrix
- Principal Component Analysis and Clustering - Better Discrimination between Classes
- Recreating a spectral analysis and cluster graph example from RPUBS using K-means algorithm
- flowMatch metaclustering throws unexpteced error
- How to change 2D k-means algorithm to 2D EM-algorithm?
Related Questions in HIERARCHICAL-CLUSTERING
- Unsupervised random forest with large dataset
- How to configure in build keepalived of opensips?
- Clusters Documents and Classify New Ones
- Set sample points for each cluster in kmeans using Python
- In scikit-learn's agglomerative clustering algorithm how would you get all the intermediate clusters?
- Computing p-values when using pvclust with Bray-Curtis similarity
- finding connected components using BigQuery SQL
- Comparing the same Clustering Algorithm against different datasets on a single sample
- Clustering for the Protein sequences (With/without MSA)
- Dendrogram not arranging by fclusters appropriately
- How to put a single skill in multiple clusters?
- Hierarchical clustering with constraints
- Sequence alignment for hierarchical cluster analysis on categorical sequence data
- maximum recursion depth exceeded while getting the str of an object on Google Colab
- Clustering data using scipy and a distance matriz in Python
Related Questions in SIZE-REDUCTION
- Clone git with default reference repositories including submodules
- Why is a statically-linked "hello world" program so big (over 650 KB)?
- How to Decrease Image File Size. When It Download From Google Image URL or Domain URL Using Nodejs?
- Julia: Get range (minimum / maximum values) of a multidimensional array along specific axes
- Optimize destructors size away
- Calling external function on pandas dataframe column
- I am trying to reduce the number of columns of data set
- Make Tensorflow library + pretrained MobileNet as small as possible, to make APK smaller
- Retain order when taking unique rows in a NumPy array
- Data reduction/transformation
- Large binary file in IAR release configuration
- How to retrieve values in even/odd indices using OpenCV, c++ in an elegant way?
- Why the size of MySQL MyISAM table is the same after striping some data from VARCHAR column?
- How to render the same page from different get request?
- Compress on the fly generated PDF files having embeded fonts
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Subsampling is fairly common.
Many statistical properties are well preserved when you subsample. If you have 1000000 points, the mean estimated from just 10000 is already very close; and maybe well within the reliability of your data.
Another approach is clustering with a simple and fast method such as k-means - and a large k, say sqrt(N). This will approximate your data with a least-squares objective using k data points. (You should also use the weights afterwards, as the resulting vectors will reflect different amounts of data).
Last but not least, many reduction techniques - probably including PCA - can be used on the transposed matrix. Then you reduce the number of instances, not the number of variables. But PCA is fairly expensive and on the transposed matrix, it would scale O(n³). So I would rather consider directly working with a truncated SVD.
But apparently your data are time series. I would suggest to look for data reduction that integrates your knowledge about what is important here.