Table in r to be weighted

4.3k Views Asked by At

I'm trying to run a crosstab/contingency table, but need it weighted by a weighting variable. Here is some sample data.

set.seed(123)
sex <- sample(c("Male", "Female"), 100, replace = TRUE)
age <- sample(c("0-15", "16-29", "30-44", "45+"), 100, replace = TRUE)
wgt <- sample(c(1:10), 100, replace = TRUE)
df <- data.frame(age,sex, wgt)

I've run this to get a regular crosstab table

table(df$sex, df$age)

to get a weighted frequency, I tried the Hmisc package (if you know a better package let me know)

library(Hmisc)
wtd.table(df$sex, df$age, weights=df$wgt)
Error in match.arg(type) : 'arg' must be of length 1

I'm not sure where I've gone wrong, but it doesn't run, so any help will be great. Alternatively, if you know how to do this in another package, which may be better for analysing survey data, that would be great too. Many thanks in advance.

4

There are 4 best solutions below

3
On BEST ANSWER

Try this

GDAtools::wtable(df$sex, df$age, w = df$wgt)

Output

       0-15 16-29 30-44 45+ NA tot
Female   56    73    60  76  0 265
Male     76    99   106  90  0 371
NA        0     0     0   0  0   0
tot     132   172   166 166  0 636

Update

In case you do not want to install the whole package, here are two essential functions you need:

wtable and dichotom

Source them and you should be able to use wtable without any problem.

8
On

A solution is to repeat the rows of the data.frame by weight and then table the result.

The following repeats the data.frame's rows (only relevant columns):

df[rep(row.names(df), df$wgt), 1:2]

And it can be used to get the contingency table.

table(df[rep(row.names(df), df$wgt), 1:2])
#       sex
#age     Female Male
#  0-15      56   76
#  16-29     73   99
#  30-44     60  106
#  45+       76   90
2
On

A tidyverse solution using your data same set.seed, uncount is the equivalent to @Rui's rep of the weights.

library(dplyr)
library(tidyr)

df %>%
   uncount(weights = .$wgt) %>% 
   select(-wgt) %>%
   table
#>        sex
#> age     Female Male
#>   0-15      56   76
#>   16-29     73   99
#>   30-44     60  106
#>   45+       76   90
0
On

Base R, in stats, has xtabs for exactly this:

xtabs(wgt ~ age + sex, data=df)