I have a 11GB .csv file which I would ultimately need as a big.matrix object. From what I have read I think I need to create a filebacked big.matrix object but I cannot figure out how to do this.
The file is too large for me to load directly into R and manipulate from there as I have done with smaller datasets. How do I produce a big.matrix object from the .csv file?
See if this can be of help. I post as an answer because it contains too much code for a comment.
The strategy is to read chunks of 10K rows at a time and coerce them to a sparse matrix. Then,
rbindthose sub-matrices together.It uses
data.table::freadfor speed and a function in packagefpeekto count the number of lines in the data file. This function is also fast.Test data
With the following test data all went alright. I also tried it with a bigger matrix.
The
pathis optional.