Is there a way to handle "cannot allocate vector of size" issue without dropping data?

203 Views Asked by At

Unlike a previous question about this, this case is different to that and that is why I'm asking. I have an already cleaned dataset containing 120 000 observations of 25 variables, and I am supposed to analyze it all through logistic regression and random forest. However, I get an error "cannot allocate vector of size 98 GB whereas my friend doesn't.

Summary says most of it. I even tried to reduce number of observations to 50 000 and number of variables in dataset to 15 (used 5 of them in regression) and it failed. However, I tried sending the script where i shortened the dataset to a friend, and she could run it. This is odd because I have a 64 bit system and 8 GB RAM, she has only 4 GB. So it appears that the problem lies with me.

pd_data <- read.csv2("pd_data_v2.csv")
split <- rsample::initial_split(pd_data, prop = 0.7)
train <- rsample::training(split)
test <- rsample::testing(split)

log_model <- glm(default ~ profit_margin + EBITDA_margin +   payment_reminders, data = pd_data, family = "binomial")
log_model

The result should be a logistic model where I can see coefficients and meassure it's accuracy, and make adjustments.

0

There are 0 best solutions below