I have a large dataset (N=12000) from a survey. I am using weights in my regressions, because this sample gave blood for analysis (from the whole eligible sample). My results make sense until the moment I started analyzing subgroups, e.g., respondents with genetic markers within the sample. I was thinking that it is because I am still weighting the regression when I shouldn't. My thoughts are that, since the genetic marker is in a subgroup, it is already a sample representing the population and the weights are introducing noise. I am trying to look for reasonable sources and explanations, but so far, I haven't found anything. Maybe you can help me.
Am I applying weights indiscriminately to my survey data?
313 Views Asked by Yacila At
1
There are 1 best solutions below
Related Questions in STATISTICS
- How to make pandas show large datasets in output?
- How to construct polygons from a 2D data to compute spatial autocorrelation in R
- Is python statsmodel elastic net regression automatically standard deviation deflated?
- How can I emulate Microsoft Excel's Solver functionality (GRG Nonlinear) in python?
- How do I find the probability that one of my probabilities will occur?
- Timeline-ish data to Occurence/Exposure data
- Handling Error Propagation Above Biological Thresholds in R with predictNLS
- Why is there such a difference between chi2 and mcnemar?
- Handling Nested One-Level Random Effects in Linear Mixed Models in R
- Model failed to converge (gamma model, self-paced reading data)
- How to quantify the consistency of a sequence of predictions, incl. prediction confidence, using standard function from sklearn or a similar library
- P-values for each comparison in a Kruskal post hoc test in R?
- R Metaprop P-value overlapping with forest plot axes
- Monte Carlo simulation Lotto Germany
- How does emmeans adjust the p-values when using "Tukey" as adjustment method? (Solved)
Related Questions in SAMPLE
- Take a sample of the MNIST dataset
- unexpected sample behavior
- Group dataframe and sample n rows with equal probability between groups
- how can i sample equal number of clusters with probability proportional to size with replacement (ppswr) from a very large number of stratas?
- Drawing a random sample from a very large dataset
- Sampling from a Normal distribution with sparse covariance matrix
- In R in prefiltering in deseq2 analysis which should i use: rowSums(counts(dds))>=patients or patients/2?
- How to randomly select the content of some cells into a data frame?
- When I use if condition, it fails to mark sample even with marksample touse
- Starting Learning node.js - Their Simple Server Sample Gives Syntax Errors
- How to add samples in Asset Store?
- spatSample() choosing NA cells even when na.rm = TRUE
- Train model with only augmented samples
- How to sample for each id on Teradata
- Generating Samples from Customized Distribution - Stuck with Range Limitation
Related Questions in SURVEY
- Loop through chisq.test RStudio
- RedCap stopaction kicks out participant without an option to continue
- Specifying panel data as survey object
- Survey treatment with R language (NA values)
- 2 grouping variables using svyboxplot
- Complex survey tables in R
- How i get Facebook survey data and print?
- Setting up survey object using srvyr
- Is it possible to assign a dynamic matrix row value to the multi-select dropdown with SurveyJS using Angular?
- Difference in KM estimation: svykm vs survfit with weights
- How do I separate the answers to a multiple-choice survey question where multiple answers are allowed, in R?
- Name field to be exempted from replication in a panel
- How to turn "ticking" answers from a survey into categorical variables
- Separating a string based on known expressions (with no viable delimiter or regular expression)
- How to subset a survey object in R and use the tbl_svysummary with the add_difference function
Related Questions in WEIGHTED
- averaging weighted values giving inaccurate overall view
- how to calculate WOE for a dataset with weights column for both continuous and categorical data
- Plotting weighted distribution
- Apply modified z score function by group with weight
- Survey package: what does the combined.weights argument really mean?
- SwiftUI Weighted stack - views stacked on top of each other
- Algorithm to extract elements from weighted queues
- I get error for object "weight" not found whereas it exists in the data frame
- Using FIML with Sample.weights in Lavaan with fixed .X = F
- Transferring svyset from Stata to R
- I want to create weighted candlesticks, based on 5 assets, in Pinescript version 5
- Tensorflow binary classifier with weighted loss function - Why does train history accuracy doesn't match train accuracy?
- weighted sum in terra::project: what are weights?
- Why error "numerical expression has 50000 elements: only the first used" emerges in R?
- How to solve the weight coefficient of multidimensional hybrid copula?
Related Questions in METHODOLOGY
- Using python with custom spreadsheets; is it best practice to do the maths in python or with spreadsheet formulae
- How can I practice DDD on my own without access to a domain expert?
- Comparing interval to continuous data
- Am I applying weights indiscriminately to my survey data?
- Is there an R code to run predicted probabilities in R for hurdle models?
- Should I create a block or an element BEM CSS?
- git: freeze/unfreeze working directory according branches
- Ways to imply return values are not meant to be stored
- c cmocka running the same test function with different parameters
- How to accurately measure performance of sorting algorithms
- How to build large software projects without cloning the entire repository?
- How to decide whether to keep the duplicate rows or remove them. I have two duplicate records but they refer to two different persons
- what is the meaning of "after Angular first displays the data-bound properties"
- CFA in data with 3 levels - estimating factor scores at level 2?
- Swift method chaining, how to reuse classes and methods?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Let me try to help you a bit here, i am trying to approach your question to my best knowledge. I am most familiar with Interpretable Machine Learning and especially Generalised Additive Models. Hence i read a bit of "The Elements of Statistical Learning" by Hastie, Tibshirani, and Friedman.
Survey weights are generally used to correct for selection bias and make the survey results more representative of the population. The weights are usually derived based on the design of the survey and the probability of each individual being selected in the survey. They allow you to extrapolate findings from the sample to the population that the sample is supposed to represent.
When you subset the data (e.g., by selecting only respondents with certain genetic markers), the original weights may no longer be appropriate because the subgroup may not represent the population in the same way the overall sample does.
In your case, if the genetic marker is not related to the likelihood of being selected into the sample (i.e., it does not affect the survey design), then the original weights can still be used when analyzing this subgroup. This is because, from the perspective of the survey design, this subgroup is just a random subset of the overall sample.
However, if the genetic marker is related to the likelihood of being selected into the sample, you might need to adjust the weights. One possibility is to reweight the data so that the weights sum to 1 within this subgroup. This would effectively treat the subgroup as a new population, under the assumption that the survey design is the same within this subgroup.
Furthermore, the weights could indeed introduce noise to your analyses. Survey weights are typically associated with larger standard errors because they reflect the variability in the sampling design. This means that when you apply these weights to your regression analyses, your standard errors might increase, leading to wider confidence intervals and potentially non-significant results.
Maybe this one here could help to get more specific guidance on weighted regression and survey sampling, "Sampling: Design and Analysis" by Sharon L. Lohr and "Complex Surveys: A Guide to Analysis Using R" by Thomas Lumley are good options.
Are you currently coding it in python using statsmodels? The WLS class requires both a
yandXinput, as well as a weights parameter which is an array-like object ofweights.