anesrake error: "no variables are off by more than ____" when they are

513 Views Asked by At

I need to weight the observations in a sample based on the marginal distributions of four demographic characteristics from a broader population. I'm currently using the package anesrake to do so.

The population info is stored in targets. This is a list containing 4 elements - one numeric vector for each respondent attribute I want to weight my sample based on. The row names of each element represent the different categories. I create targets here:

quota_age    <- c(0.30, 0.33, 0.37)
quota_race   <- c(0.62, 0.12, 0.17, 0.5, 0.3)
quota_gender <- c(0.52, 0.48)
quota_ed     <- c(0.41, 0.29, 0.19, 0.11)

names(quota_age)    <- c("18 to 34", "35 to 54", "55+")
names(quota_race)   <- c("White non-Hispanic", "Black non-Hispanic", "Hispanic", "Asian", "Other")
names(quota_gender) <- c("Female", "Male")
names(quota_ed)     <- c("HS or less", "Some college", "Bachelors", "Advanced")

targets <- list(quota_age, quota_race, quota_gender, quota_ed)

The survey file (m1b) is a data frame containing demographic info and a unique ID for each respondent (link to google sheet here). Here are the first few obs:

> head(m1b)
         ResponseId     quota_ed quota_age quota_gender         quota_race
1 R_3McITJbfcFuwc9x Some college  18 to 34       Female White non-Hispanic
2 R_2q3oeAbZgCZ5YcZ    Bachelors       55+       Female White non-Hispanic
3 R_YSVccSQ1xJ6zuDv     Advanced  35 to 54       Female White non-Hispanic
4 R_DubbKu7uJicbpQd Some college  35 to 54         Male White non-Hispanic
5 R_5zj5CNu598lCwRX    Bachelors       55+         Male              Other
6 R_21mPGFS7kHX2ELm     Advanced       55+       Female White non-Hispanic

Using the anesrake package, I want to construct a new variable called weight that I can use to account for differences between the population and sample marginal distributions in later analyses.

But when I call the anesrake function like so (the pctlim argument is extremely small to exaggerate my point):

library(anesrake)

raking <- anesrake(inputter     = targets,
                   dataframe    = m1b,
                   caseid       = m1b$ResponseId,
                   choosemethod = "total",
                   type         = "pctlim",
                   pctlim       = 0.0000001)

I get the following error:

    Error in selecthighestpcts(discrep1, inputter, pctlim) : 
      No variables are off by more than 0.00001 percent using the method you have chosen, either weighting is 
unnecessary or a smaller pre-raking limit should be chosen.

Even though this is objectively not true. Consider the quota_ed target for example:

> targets[[4]]
  HS or less Some college    Bachelors     Advanced 
        0.41         0.29         0.19         0.11 
> wpct(m1b$quota_ed)
    Advanced    Bachelors   HS or less Some college 
   0.1614583    0.3645833    0.1666667    0.3072917

Any thoughts on what I'm doing wrong would be greatly appreciated. See this link to an RBloggers post for the routine I'm trying to emulate.

1

There are 1 best solutions below

0
On BEST ANSWER

For the anesrake function to work, the following steps might be necessary:

  1. Convert your weighting variables to factors. Make sure that they don't contain empty levels.
  2. Exclude empty levels also from your targets. E.g. let's assume nobody of age 55+ would be in your data. Then you should drop that level from a) the quota_age variable as well as b) from your m1b data.
  3. The first level of your list also need to be named with the specific column names that are supposed to be weighted, i.e. after your commands add: names(targets) <- c("quota_age", "quota_race", "quota_gender", "quota_ed").