I have a very large and also sparse matrix (531K x 315K), the number of total cells is ~167 Billion. The non-zero values are only 1s. Total number of non-zero values are around 45K. Is there an efficient NMF package to solve my problem? I know there are couple of packages for that and they are working well only for small size of data matrix. Any idea helps. Thanks in advance.
Very Large and Very Sparse Non Negative Matrix factorization
6.6k Views Asked by mgokhanbakal At
1
There are 1 best solutions below
Related Questions in PYTHON
- How to store a date/time in sqlite (or something similar to a date)
- Instagrapi recently showing HTTPError and UnknownError
- How to Retrieve Data from an MySQL Database and Display it in a GUI?
- How to create a regular expression to partition a string that terminates in either ": 45" or ",", without the ": "
- Python Geopandas unable to convert latitude longitude to points
- Influence of Unused FFN on Model Accuracy in PyTorch
- Seeking Python Libraries for Removing Extraneous Characters and Spaces in Text
- Writes to child subprocess.Popen.stdin don't work from within process group?
- Conda has two different python binarys (python and python3) with the same version for a single environment. Why?
- Problem with add new attribute in table with BOTO3 on python
- Can't install packages in python conda environment
- Setting diagonal of a matrix to zero
- List of numbers converted to list of strings to iterate over it. But receiving TypeError messages
- Basic Python Question: Shortening If Statements
- Python and regex, can't understand why some words are left out of the match
Related Questions in BIGDATA
- How to make an R Shiny app with big data?
- Liquibase as SaaS To Configure Multiple Database as Dynamic
- how to visualize readible big datasets with matplotlib?
- Are there techniques to mathematically compute the amount of searching in greedy graph searching?
- Pyspark & EMR Serialized task 466986024 bytes, which exceeds max allowed: spark.rpc.message.maxSize (134217728 bytes)
- Is there a better way to create a custom analytics dashboard tailored for different users?
- Trigger a lambda function/url with Apache Superset
- How to download, then archive and send zip to the user without storing data in RAM and memory?
- Using bigmemory package in R to solve the Ram memory problem
- spark - How is it even possible to get an OOM?
- Aws Athena SQL Query is not working in Apache spark
- DB structure/file formats to persist a 100TB table and support efficient data skipping with predicates in Spark SQL
- How can I make this matching function faster in R? It currently takes 6-7 days, and this is not practical
- K-means clustering time series data
- Need help related to Data Sets
Related Questions in SPARSE-MATRIX
- cov2corr() for scipy sparse matrices
- Saving a scipy.sparse matrix directly as a regular txt file
- Parallelize nested loop with running sum in Fortran
- How should very large but highly symmetric arrays be handled in Python?
- Sum each column of a sparse matrix multiplied by a vector
- Iterative Matrix-Vector Product Eigensolvers in Python
- Recovering explicit zeros from Scipy MST
- How to compute (row) basis of a sparse matrix in Eigen library?
- Inconsistent results when using Scipy Minimum Spanning tree with sparse and dense inputs
- Bug in large sparse CSR binary matrices multiplication result
- Stitching together overlapping arrays in scipy
- Sampling from a Normal distribution with sparse covariance matrix
- Why EIGS is not able to reproduce the same result as EIG for a generalised eigenvalue problem?
- Create blockwise shifted sparse matrix in matlab directly
- Manipulating sparse matrices in Swift before solving system
Related Questions in MATRIX-FACTORIZATION
- chol(x,pivot=TRUE) does not have attribute pivot in R
- Matrix Factorization with user and item regularization VS Probabilistic Matrix Factorization
- Generating product latent vectors from interactions
- What does the mode 'reduced' in numpy.linalg.rq do?
- reconstruct Covariance matrix from dataset generated given that Covariance matrix (using Cholesky factorization)
- why my Linear Least-Squares does not fit right the data-points
- In Distributions.jl package for Julia, how to define MvNormal distributions with the Cholesky matrix?
- Extracting Item Latent Vectors from Trained AWS Factorization Machines Model
- Sliders have an effect on benchmark execution time
- Is it possible to solve the system using the Cholesky decomposition if the matrix is not positive definite?
- LU-Factorization differs in Lapack and Matlab
- Why is this simple Fortran matrix inversion code not returning the expected value with LAPACK?
- How perform matrix factorization, having some fixed columns as output?
- How can I further optimize this code regarding matrix factorization?
- How SVD works in matrix factorization
Related Questions in NMF
- Unable to manually set number of parallel workers in R "NMF" package -- only using 2 cores
- Is there any acceptable range for NMF reconstruction error?
- Surprise NMF object is not callable
- Is it possible to run non-negative matrix factorization (NMF) on independent variables with the dependent variable as a weighting factor in R?
- model.fit_transform valueerror expected 2D array, got scalar array instead
- Using the NMF method for hashtag recommendation in collaborative filter recommendation systems
- In R, do correlation between a column of a data frame between all columns in another data frame?
- why does Non negative matrix Factorization decompose a spectrogram into time and frequency component?
- How to determine which document falls under a particular topic after applying topic modelling techniques like NMF, LDA, BERTopic?
- Reshape W to plot component images: sklearn NMF output from decomposition of 3D numpy array
- get_coherence : C_V method gets an error but U_Mass works
- Unable to find dot product of two matrix (W and H from NMF ) with same inner dimensions
- ValueError: array must not contain infs or NaNs with NMF and TF-IDF in Python
- Can we save the lda model with old data and use trained model for new data?
- Topic Modelling - I have used NMF and LDA, what is next?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
scikit-learn will handle this easily!
Code:
Output:
Remarks:
Additional Constraints
As mentioned in the comments, OP wants to add additional constraints, while still not specifying these formally.
This will need a whole new implementation of some optimization-procedure including some theory-footwork (depending on the constraints).
As an alternative, this can be solved by general-purpose Convex-Programming solvers. E.g. formulated by cvxpy and solved by SCS. Of course the alternating-minimization procedure needs to be done too (as the joint-problem is non-convex) and it will scale worse than this specialized sklearn-implementation. But it might work for OPs data.