How to run Dirichlet Regression with a big data set in R?

88 Views Asked by At

I would like to run a Dirichlet regression on a large data set using the DirichReg Package in R. I currently have data.frame with 37 columns and ~13,000,000 rows.

However, running this model on all of my data instantly crashes R. I am using a Linux machine with 16 cores and 128 GB of memory. Even just cutting down my data to only 1000 points still causes R to almost immediately crash and restart.

Am I doing something wrong? Is there any way I can parallelize this operation to get this model to run?

I am running a model with the following syntax:

data.2 <- data

data.2$y_variable <- DR_data(data[,c(33:35)])

model <- DirichReg(y_variable ~ x_variable, data.2)

I have to create the y_variable in a separate data.2 data.frame, because running data$y_variable <- DR_data(data[,c(33:35)]) will crash R. I have no idea why this is.

1

There are 1 best solutions below

0
user438383 On

Bit of a guess why it's 'crashing' R, but if it's due to RAM issues then you can update the table by reference, rather than making a shallow copy of the entire data:

library(data.table)
setDT(data)
dat[, y := DR_data(data[,c(33:35)])]