Can I correct the coefficient standard errors after oversampling my data?

109 Views Asked by cbowers At 16 March 2023 at 00:40

I am trying to fit a fixed effects linear regression to my data and interpret the coefficients. I have an imbalanced dataset (~97% negative cases), which was affecting my ability to fit the model and calculate coefficients for every independent variable, so I used SMOTE to oversample the positive cases and roughly double the size of my dataset. I care way more about the coefficient values and standard errors than the actual predictive accuracy of the model-- the question I am trying to answer is "what is the effect of x on y?" But because my SMOTE dataset is twice as large as my original dataset, my standard errors are artificially small/overconfident. Is there a way to correct for this and keep the SMOTE coefficient estimates while calculating standard errors based on the original data?

Original Q&A

There are 1 best solutions below

Next Door Engineer On 16 March 2023 at 11:47

You have to correct this by doing something like this - Recalibrate predicted probabilities.

Or you can do a weighted regression as well -

weights = np.where(original_data_flag, 1/np.mean(original_data_flag), 1/np.mean(~original_data_flag))

lm = LinearRegression()
lm.fit(x, y, sample_weight=weights)

Can I correct the coefficient standard errors after oversampling my data?

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in LINEAR-REGRESSION

Related Questions in COEFFICIENTS

Related Questions in STANDARD-ERROR

Related Questions in OVERSAMPLING

Trending Questions

Popular # Hahtags

Popular Questions