How to solve error "Too few observations." when using ROSE to balancing data in R?

3.4k Views Asked by At

I try to use ROSE library on R to rebalancing target variable in my dataset. Here is my information of my dataset.

  • My original dataset have total 132056 records.
  • There are total 279 cases (0.21%) of minor class in target variable.
  • There are total 131777 cases (99.79%) of major class in target variable.

I would like to undersampling the dataset to make the percentage of minor class increase to 5%.

Here is my code :

df_Under <- ovun.sample(Target ~ ., data = df, method = "under", N =5580, seed = 1)

However, after run the code above, I got the following error message.

"Error in (function (formula, data, method, subset, na.action, N, p = 0.5,  :Too few observations." 

I tried play with other method of ROSE such as "over" and "both" but there are the same error occurs.

How can I fixed this problem ?

Kind regards,

3

There are 3 best solutions below

0
On

I was facing the same problem. The problem was actually in the dataset which had columns (variables) with NA/Nan.

Please try running the code after NA removal.

Let me know if this helps.

0
On

I believe you want your code to use p = 0.05 (5%) not p = 0.5 (50%) like you have (which is the function's default) and to over sample to bring up the sample size of the minority class like you mentioned in your post:

df_Under <- ovun.sample(Target ~ ., data = df, method = "over", N =5580, seed = 1, p = 0.05)
0
On

data.balanced.under <- ovun.sample(Target ~ ., data = df, method = "under",p= 0.5)$data

this will solve your problem