I have a very large and also sparse matrix (531K x 315K), the number of total cells is ~167 Billion. The non-zero values are only 1s. Total number of non-zero values are around 45K. Is there an efficient NMF package to solve my problem? I know there are couple of packages for that and they are working well only for small size of data matrix. Any idea helps. Thanks in advance.
Very Large and Very Sparse Non Negative Matrix factorization
6.6k Views Asked by mgokhanbakal At
1
There are 1 best solutions below
Related Questions in PYTHON
- new thread blocks main thread
- Extracting viewCount & SubscriberCount from YouTube API V3 for a given channel, where channelID does not equal userID
- Display images on Django Template Site
- Difference between list() and dict() with generators
- How can I serialize a numpy array while preserving matrix dimensions?
- Protractor did not run properly when using browser.wait, msg: "Wait timed out after XXXms"
- Why is my program adding int as string (4+7 = 47)?
- store numpy array in mysql
- how to omit the less frequent words from a dictionary in python?
- Update a text file with ( new words+ \n ) after the words is appended into a list
- python how to write list of lists to file
- Removing URL features from tokens in NLTK
- Optimizing for Social Leaderboards
- Python : Get size of string in bytes
- What is the code of the sorted function?
Related Questions in BIGDATA
- How to add a new event to Apache Spark Event Log
- DB candidate as CouchDB/Schema replacement
- Getting java.lang.IllegalArgumentException: requirement failed while calling Sparks MLLIB StreamingKMeans from java application
- More than expected jobs running in apache spark
- Does Cassandra support aggregation function or any other capabilities like Map Reduce?
- Accessing a large number of unsorted array elements in Python
- What are the approaches to the Big-Data problems?
- Talend Open Studio for Big Data
- How to store and retrieve time series using google appengine using python
- Connecting Spark code from web application
- Designing an API on top of BigQuery
- Apache Spark architecture
- Hive(Bigdata)- difference between bucketing and indexing
- When does an action not run on the driver in Apache Spark?
- Use of core-site.xml in mapreduce program
Related Questions in SPARSE-MATRIX
- Fastest Way to access and put values in matrix
- Sparse Random Matrix with Eigen
- Elementwise addition of sparse scipy matrix vector with broadcasting
- Accessing a large number of unsorted array elements in Python
- Argmax of each row or column in scipy sparse matrix
- sparse representation for image prediction
- minimum degree ordering using boost graph library
- complexity of generating a sparse matrix
- Computing time complexity of the sparse matrix (2)
- Sparse matrix from list in R
- Lots of cache miss, Sparse matrix multiplication
- How to store sparse matrix?
- Latent Dirichlet Allocation on Sparse Matrix (
- scipy sparse matrix -- accessing multiple elements of a path
- Clustering a large, very sparse, binary matrix in R
Related Questions in MATRIX-FACTORIZATION
- Debugging large task sizes in Spark MLlib
- compute AUC metric for Matrix Factorization output
- Julia-Lang how to solve tridiagonal system
- Computing Low-Rank approximation in Python
- In Distributions.jl package for Julia, how to define MvNormal distributions with the Cholesky matrix?
- why my Linear Least-Squares does not fit right the data-points
- Issue when Re-implement Matrix Factorization in Pytorch
- preparing product purchase data for pyspark ALS implicit recommendations
- Spark- The purpose of saving ALS model
- Is there a Python function for computing a sparse non-negative factorisation of a matrix?
- Using a recommender system with new user
- I'm having a weird issue with a Fortran code
- ichol as cholinc replacement: nonpositive pivot
- ALS algorithm in Dask optimization
- How SVD works in matrix factorization
Related Questions in NMF
- Need to perform topic modelling and output the results to a csv sheet
- Unable to change labels when I plot NMF rank survey in R
- Fast NMF in R on sparse matrices
- Number of keywords in text cluster
- How to select optimal number of components for NMF in python sklearn?
- How to remove None from the result of a function?
- octave nmf_bpas error: vertical dimensions mismatch (8x1 vs 1x400)
- Assign Topic from NNMF Topic Modelling
- How to use sklearn's Matrix factorization to predict new users' recommendation scores
- Very Large and Very Sparse Non Negative Matrix factorization
- Unable to find dot product of two matrix (W and H from NMF ) with same inner dimensions
- get_coherence : C_V method gets an error but U_Mass works
- Reshape W to plot component images: sklearn NMF output from decomposition of 3D numpy array
- In R, do correlation between a column of a data frame between all columns in another data frame?
- How to test the trained NMF topic model on new text
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
scikit-learn will handle this easily!
Code:
Output:
Remarks:
Additional Constraints
As mentioned in the comments, OP wants to add additional constraints, while still not specifying these formally.
This will need a whole new implementation of some optimization-procedure including some theory-footwork (depending on the constraints).
As an alternative, this can be solved by general-purpose Convex-Programming solvers. E.g. formulated by cvxpy and solved by SCS. Of course the alternating-minimization procedure needs to be done too (as the joint-problem is non-convex) and it will scale worse than this specialized sklearn-implementation. But it might work for OPs data.