Weighted quartiles with descr() from SummaryTools

321 Views Asked by At

I`m doing basic descriptive statistics of a household survey dataframe. I have a column who reports the number of times of an event in certain period of time. The survey comes with a factor column equivalent to the weight of the observation.

So, when I use this code

times_theater<- descr(data17$s08a_02, report.nas = F, stats = "all")
times_theather

I get this

Descriptive Statistics  
data17$s08a_02  
N: 38201  

                    s08a_02
----------------- ---------
             Mean      2.58
          Std.Dev      2.41
              Min      1.00
               Q1      1.00
           Median      2.00
               Q3      3.00
              Max     40.00
              MAD      1.48
              IQR      2.00
               CV      0.93
         Skewness      5.80
      SE.Skewness      0.08
         Kurtosis     64.28
          N.Valid   1027.00
        Pct.Valid      2.69

This are the "brute" values, so I need to apply the weights:

times_theater<- descr(data17$s08a_02, report.nas = F, weights = data17$factor, stats = "all")
times_theather

And the output is this:

Weighted Descriptive Statistics  
data17$s08a_02  
Weights: factor  
N: 38201  

                    s08a_02
--------------- -----------
           Mean        2.55
        Std.Dev        2.31
            Min        1.00
         Median        2.00
            Max       40.00
        N.Valid   288118.00
      Pct.Valid        2.57

As you can see, I lost the quartile's information (Q1, Q3, IQR) and I would really like them to show up in the same output.

Any ideas on how to solve this?

pd: I know in this case the differences are almost non-existent, but there are some spending and income variables I would really need to get the quartiles later on.

Edit2: I know the documentation says descr() quartiles won't work with weights, I want a way to calculate them and insert them in the previous output.

1

There are 1 best solutions below

0
Zoltan Fabian On

Hmisc package contains a bunch of weighted functions including wtd.quantile. Consider the following snippet:

set.seed(1)
x <- runif(500)
wts <- sample(1:6, 500, TRUE)
quantile(x)
Hmisc::wtd.quantile(x, wts)

Which will result:

> wtd.quantile(x, wts)
         0%         25%         50%         75%        100% 
0.001836858 0.260238785 0.461551841 0.739641746 0.996077372 
> quantile(x)
         0%         25%         50%         75%        100% 
0.001836858 0.258128640 0.476269632 0.734145740 0.996077372

As its defaults to quartile values. Of course one can specify any quantile values. Cf. ?wtd.quantile and there is also survey::svyquantile, in case you have a complex sampling design.