I've recently started using R for data analysis. Now I've got a problem in ranking a big query dataset (~1 GB in ASCII mode, over my laptop's 4GB RAM in binary mode). Using bigmemory::big.matrix
for this dataset is a nice solution, but providing such a matrix 'm' in the gbm()
or randomForest()
algorithms causes the error:
cannot coerce class 'structure("big.matrix", package = "bigmemory")' into a data.frame
class(m) outputs the folowing:
[1] "big.matrix"
attr(,"package")
[1] "bigmemory"
Is there a way to correctly pass a big.matrix
instance into these algorithms?
I obviously can't test this using data of your scale, but I can reproduce your errors by using the formula interface of each function:
Not using the formula interface for
randomForest
is fairly common advice for large data sets; it can be quite inefficient. If you read?gbm
, you'll see a similar recommendation steering you towardsgbm.fit
for large data as well.