I'm trying to set up a survey object using srvyr. The full dataset contains rows where no response from a sampling unit was achieved in addition to those rows where a response was achieved. My weights column only contains values for the achieved sample. Where no response for a sampled unit was achieved, the value for the weight is NA. My question is: should the survey object be created after filtering out the unachieved cases (i.e. only including the achieved sample) or should the survey object include all cases (achieved and unachieved sample), in which case consideration will have to be given to assigning a value to the weight column for the unachieved sample. My question then would be, what weight would one give for the non-respondents (0 perhaps)? Any insights would be greatly welcomed.
I created a survey object first using only the achieved sample but this created issues with lonely PSUs. I then tried to create the survey object using the full sample (i.e. both responding and non-responding cases) but the NA values in the weights column for the non-respondents prevented the object from being created.
From the statistical point of view it would be best to specify weights for non-responding units that are whatever their sampling weight is. If you've done a probability sample then you know the sampling weights for everyone in the sample you designed, whether they responded or not.
This obvious won't fix any real problems, but it would let you do raking/post-stratification to reduce non-response bias and it would let you run analyses that give estimates for the responding subpopulation.