summary_table in qwraps2 with group_by in R

1.1k Views Asked by At

I am trying out the qwraps2 package and some of its functions. In particular I am interested in the summary_table tool for output. I am using the iris data set for practice, but I noticed something strange when using group_by in the summary_table:

library(datasets)
data("iris")
options(qwraps2_markup = "markdown")
our_summary1 <-
  list("Sepal Length" =
       list("min" = ~ min(iris$Sepal.Length),
            "max" = ~ max(iris$Sepal.Length),
            "mean (sd)" = ~ qwraps2::mean_sd(iris$Sepal.Length)),
       "Sepal Width" =
       list("min" = ~ min(iris$Sepal.Width),
            "median" = ~ median(iris$Sepal.Width),
            "max" = ~ max(iris$Sepal.Width),
            "mean (sd)" = ~ qwraps2::mean_sd(iris$Sepal.Width)),
       "Petal Length" =
       list("min" = ~ min(iris$Petal.Length),
            "max" = ~ max(iris$Petal.Length),
            "mean (sd)" = ~ qwraps2::mean_sd(iris$Sepal.Length)),
       "Petal Width" =
       list("min" = ~ min(iris$Petal.Width),
            "max" = ~ max(iris$Petal.Width),
            "mean (sd)" = ~ qwraps2::mean_sd(iris$Petal.Width)),
        "Species" =
       list("Setosa" = ~ qwraps2::n_perc0(iris$Species == "setosa"),
            "Versicolor"  = ~ qwraps2::n_perc0(iris$Species == "versicolor"),
            "Virginica"  = ~ qwraps2::n_perc0(iris$Species == "virginica"))
       )

bytype <- qwraps2::summary_table(dplyr::group_by(iris,Species),our_summary1)
bytype

The output i get is: output from the above code

This doesnt make sense, it says that the statistics on different variables across different flower species are the same, which they are not. I cross checked this by doing:

aggregate(iris[1:4], list(iris$Species), mean)

which shows that for example the mean of the different variables varies across species.

Why is dplyr::group_by not doing what it should?

i posted the output as best I could, sorry and thank you for the comprehension.

2

There are 2 best solutions below

0
Peter On BEST ANSWER

The reason the group_by call does not appear to do anything is because the data pronoun .data is not being used in the summary definition. As written, the summary table is constructed based on the whole iris data set, regardless of any grouping or subsetting. The .data pronoun is needed so that the the tidyverse tools behind summary_table use the correct scoping.

library(datasets)
library(qwraps2)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

data("iris")
options(qwraps2_markup = "markdown")

our_summary1 <-
  list("Sepal Length" =
       list("min" = ~ min(.data$Sepal.Length),
            "max" = ~ max(.data$Sepal.Length),
            "mean (sd)" = ~ qwraps2::mean_sd(.data$Sepal.Length)),
       "Sepal Width" =
       list("min" = ~ min(.data$Sepal.Width),
            "median" = ~ median(.data$Sepal.Width),
            "max" = ~ max(.data$Sepal.Width),
            "mean (sd)" = ~ qwraps2::mean_sd(.data$Sepal.Width)),
       "Petal Length" =
       list("min" = ~ min(.data$Petal.Length),
            "max" = ~ max(.data$Petal.Length),
            "mean (sd)" = ~ qwraps2::mean_sd(.data$Sepal.Length)),
       "Petal Width" =
       list("min" = ~ min(.data$Petal.Width),
            "max" = ~ max(.data$Petal.Width),
            "mean (sd)" = ~ qwraps2::mean_sd(.data$Petal.Width)),
        "Species" =
       list("Setosa" = ~ qwraps2::n_perc0(.data$Species == "setosa"),
            "Versicolor"  = ~ qwraps2::n_perc0(.data$Species == "versicolor"),
            "Virginica"  = ~ qwraps2::n_perc0(.data$Species == "virginica"))
       )


bytype <- qwraps2::summary_table(dplyr::group_by(iris,Species),our_summary1)
bytype
#> 
#> 
#> |                        |Species: setosa (N = 50) |Species: versicolor (N = 50) |Species: virginica (N = 50) |
#> |:-----------------------|:------------------------|:----------------------------|:---------------------------|
#> |**Sepal Length**        |&nbsp;&nbsp;             |&nbsp;&nbsp;                 |&nbsp;&nbsp;                |
#> |&nbsp;&nbsp; min        |4.3                      |4.9                          |4.9                         |
#> |&nbsp;&nbsp; max        |5.8                      |7.0                          |7.9                         |
#> |&nbsp;&nbsp; mean (sd)  |5.01 &plusmn; 0.35       |5.94 &plusmn; 0.52           |6.59 &plusmn; 0.64          |
#> |**Sepal Width**         |&nbsp;&nbsp;             |&nbsp;&nbsp;                 |&nbsp;&nbsp;                |
#> |&nbsp;&nbsp; min        |2.3                      |2.0                          |2.2                         |
#> |&nbsp;&nbsp; median     |3.4                      |2.8                          |3.0                         |
#> |&nbsp;&nbsp; max        |4.4                      |3.4                          |3.8                         |
#> |&nbsp;&nbsp; mean (sd)  |3.43 &plusmn; 0.38       |2.77 &plusmn; 0.31           |2.97 &plusmn; 0.32          |
#> |**Petal Length**        |&nbsp;&nbsp;             |&nbsp;&nbsp;                 |&nbsp;&nbsp;                |
#> |&nbsp;&nbsp; min        |1.0                      |3.0                          |4.5                         |
#> |&nbsp;&nbsp; max        |1.9                      |5.1                          |6.9                         |
#> |&nbsp;&nbsp; mean (sd)  |5.01 &plusmn; 0.35       |5.94 &plusmn; 0.52           |6.59 &plusmn; 0.64          |
#> |**Petal Width**         |&nbsp;&nbsp;             |&nbsp;&nbsp;                 |&nbsp;&nbsp;                |
#> |&nbsp;&nbsp; min        |0.1                      |1.0                          |1.4                         |
#> |&nbsp;&nbsp; max        |0.6                      |1.8                          |2.5                         |
#> |&nbsp;&nbsp; mean (sd)  |0.25 &plusmn; 0.11       |1.33 &plusmn; 0.20           |2.03 &plusmn; 0.27          |
#> |**Species**             |&nbsp;&nbsp;             |&nbsp;&nbsp;                 |&nbsp;&nbsp;                |
#> |&nbsp;&nbsp; Setosa     |50 (100)                 |0 (0)                        |0 (0)                       |
#> |&nbsp;&nbsp; Versicolor |0 (0)                    |50 (100)                     |0 (0)                       |
#> |&nbsp;&nbsp; Virginica  |0 (0)                    |0 (0)                        |50 (100)                    |

Created on 2020-03-01 by the reprex package (v0.3.0)

enter image description here

0
Klowie Stewart On

Try using .data$ instead of iris$ when declaring the variables. I had the same problem and this fixed it.