I would like to run a Dirichlet regression on a large data set using the DirichReg Package in R. I currently have data.frame with 37 columns and ~13,000,000 rows.
However, running this model on all of my data instantly crashes R. I am using a Linux machine with 16 cores and 128 GB of memory. Even just cutting down my data to only 1000 points still causes R to almost immediately crash and restart.
Am I doing something wrong? Is there any way I can parallelize this operation to get this model to run?
I am running a model with the following syntax:
data.2 <- data
data.2$y_variable <- DR_data(data[,c(33:35)])
model <- DirichReg(y_variable ~ x_variable, data.2)
I have to create the y_variable in a separate data.2 data.frame, because running data$y_variable <- DR_data(data[,c(33:35)]) will crash R. I have no idea why this is.
Bit of a guess why it's 'crashing' R, but if it's due to RAM issues then you can update the table by reference, rather than making a shallow copy of the entire data: