I am trying to fit a fixed effects linear regression to my data and interpret the coefficients. I have an imbalanced dataset (~97% negative cases), which was affecting my ability to fit the model and calculate coefficients for every independent variable, so I used SMOTE to oversample the positive cases and roughly double the size of my dataset. I care way more about the coefficient values and standard errors than the actual predictive accuracy of the model-- the question I am trying to answer is "what is the effect of x on y?" But because my SMOTE dataset is twice as large as my original dataset, my standard errors are artificially small/overconfident. Is there a way to correct for this and keep the SMOTE coefficient estimates while calculating standard errors based on the original data?
Can I correct the coefficient standard errors after oversampling my data?
109 Views Asked by cbowers At
1
There are 1 best solutions below
Related Questions in STATISTICS
- How to make pandas show large datasets in output?
- How to construct polygons from a 2D data to compute spatial autocorrelation in R
- Is python statsmodel elastic net regression automatically standard deviation deflated?
- How can I emulate Microsoft Excel's Solver functionality (GRG Nonlinear) in python?
- How do I find the probability that one of my probabilities will occur?
- Timeline-ish data to Occurence/Exposure data
- Handling Error Propagation Above Biological Thresholds in R with predictNLS
- Why is there such a difference between chi2 and mcnemar?
- Handling Nested One-Level Random Effects in Linear Mixed Models in R
- Model failed to converge (gamma model, self-paced reading data)
- How to quantify the consistency of a sequence of predictions, incl. prediction confidence, using standard function from sklearn or a similar library
- P-values for each comparison in a Kruskal post hoc test in R?
- R Metaprop P-value overlapping with forest plot axes
- Monte Carlo simulation Lotto Germany
- How does emmeans adjust the p-values when using "Tukey" as adjustment method? (Solved)
Related Questions in LINEAR-REGRESSION
- Batch Gradient Descent algorithm in python is returning huge values
- Error in running a multi-level mixed effects model on microbiome data
- How can I improve R2 score in my regression model? Predicting House Prices
- I have two dataframes representing two different time points. I want to run a linear regression model with data from both time points
- GMMAT model fit and AIC
- Fitting a curve using Linear regression - CLS and NMF
- Error with WLS estimation in R: missing or negative weights not allowed
- Fitted surface does not resemble the heatmap produced from the same data
- Beta coefficient of direct effect increases after controlling for mediator
- How to exclude abnormal data points and smooth the data before linear fitting
- Performing a simple ridge regression
- Why TukeyHSD test keeps returning NA for a linear model in R?
- Inquiry regarding a linear regression model using Python and pandas
- How to find the x-intercept of Weibull distribution
- PyTorch matrix multiplication shape error: "RuntimeError: mat1 and mat2 shapes cannot be multiplied"
Related Questions in COEFFICIENTS
- Beta coefficient of direct effect increases after controlling for mediator
- How to train a linear regression for each pandas dataframe row and generate the slope
- Linear model slope gives NA when reducing original data
- Conjoint analysis in R: within model comparison of two sets of regerssion coefficients (AMCEs)
- obtaining habitat suitability layer from step selection function (iSSF) from the amt package in r
- Coefficients (coeffs) in MATLAB for 0 coefficients of x^n
- how to define suitable coefficients?
- Create a coefplot from matrix
- Cubic model coefficient drastically different that what it actually is but provided correct prediction
- Subsetting matrix by column number or name
- How to use the pmr package's command "rol()" to estimate rank-ordered logit coefficients via MLE in R?
- How to interpret decimal coefficients from a poisson regression inflated in 1
- Optim function does not give right solution
- How to recreate a model.avg (object of class "averaging") into a model object with the same coefficients and error for fixed and random effects
- Linear regression for coefficients regression
Related Questions in STANDARD-ERROR
- How to compute standard errors of parameter obtained by minimization in Python
- Standard errors for smooth coefficient kernel regression with npscoef {np}
- Pandas groupby().sem reorders by columns after adding a non-numeric column
- Error: "Maximum number of iterations exceeded" using nlcom in Stata
- Add shaded standard error curves to geom_density in ggplot2
- Calculate standard error of contrast using a linear mixed-effect model (fitlme) in MATLAB
- R calculate the standard error using bootstrap
- How do I find the pooled standard error of the mean in R?
- R calculate the standard error using bootstrap, Error in is.data.frame(x) :
- Different Robust Standard Errors of Logit Regression in Stata and R
- NetBeans stdout and stderr on Linux
- Type of Standard Errors using PLM Package
- Calculate SE and SD in mixed ANOVA
- Quantify confidence on a difference between two groups
- Error Bars Do NOT Match Multiple Linear Regression Model
Related Questions in OVERSAMPLING
- PointCloud upsampling
- Does a oversampling technique like smote or adasyn convert all data to a single class label?
- Generate synthetic data for majority and minority classes
- How to deal with ordinal category when using SMOTENC?
- SMOTE throws a cython error during fit_resample
- SMOTE imbalanced data without changing mean and standard deviation of numerical variables
- Problems importing imblearn python package on Google Colab
- why do I get weird plots on ANN with random oversampling
- Multi-class oversampling based online bagging
- How to use resampling/oversampling methods to calculate the p-value of a single point or generate new data in the "tails" of a distribution?
- imblearn library BorderlineSMOTE module does not generate any synthetic data
- Appropriate way to use post-stratification weights when running statistical tests SPSS
- After Oversamling Smote With IsolationForest my result doesnt improve
- Can I correct the coefficient standard errors after oversampling my data?
- Oversampled train set and test set - machine learning classification
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
You have to correct this by doing something like this - Recalibrate predicted probabilities.
Or you can do a weighted regression as well -