Selecting sample saze for imbalanced data for a Random Forest in r

484 Views Asked by At

I have a large dataset (about 10000 rows) and I'm trying to run a classification random forest which I intend to use to make predictions. My data is every imbalanced. For the outcome variable I'm trying to predict about 89% of the rows is marked "1" and the remainder is "0". The code I am using is as follows:

RFTry <-randomForest(as.factor(OutcomeVariable)~., data=df, importance=TRUE, 
ntree=200, samplesize=c(500,500))

I am unsure of what samplesize I should be using. Should I be sampling the same number of rows for each outcome variable or different? And how many samples should I be taking? Below shows a table of the number of variables in each.

> table(df$OutcomeVariable)

    0     1 
10228  1234 

Thank you!

0

There are 0 best solutions below