Selecting sample saze for imbalanced data for a Random Forest in r

489 Views Asked by Fiona At 07 June 2025 at 14:53

I have a large dataset (about 10000 rows) and I'm trying to run a classification random forest which I intend to use to make predictions. My data is every imbalanced. For the outcome variable I'm trying to predict about 89% of the rows is marked "1" and the remainder is "0". The code I am using is as follows:

RFTry <-randomForest(as.factor(OutcomeVariable)~., data=df, importance=TRUE, 
ntree=200, samplesize=c(500,500))

I am unsure of what samplesize I should be using. Should I be sampling the same number of rows for each outcome variable or different? And how many samples should I be taking? Below shows a table of the number of variables in each.

> table(df$OutcomeVariable)

    0     1 
10228  1234

Thank you!

Original Q&A

Selecting sample saze for imbalanced data for a Random Forest in r

There are 0 best solutions below

Related Questions in R

Related Questions in RANDOM-FOREST

Related Questions in DECISION-TREE

Related Questions in SAMPLE-SIZE

Trending Questions

Popular # Hahtags

Popular Questions