Creating multiple frequency count tibbles at once in R

80 Views Asked by At

I have data on 30 people that includes ethnicity, gender, school type, whether they received free school meals, etc.

I want to produce frequency counts for all of these features. Currently my code looks like this:

df <- read.csv("~file")
df %>% select(Ethnicity) %>% group_by(Ethnicity) %>% summarise(freq = n())
df %>% select(Gender) %>% group_by(Gender) %>% summarise(freq = n())
df %>% select(School.type) %>% group_by(School.type) %>% summarise(freq = n())

Is there a way I can create a frequency tibble for 8 columns (e.g. ethnicity, gender, school type, etc.) in a more efficient way (e.g. 1 or 2 lines of code)?

As an example output for the ethnicity code:

# A tibble: 13 × 2
   Ethnicity                             freq
   <chr>                                <int>
 1 Asian or Asian British - Bangladeshi     1
 2 Asian or Asian British - Indian          7
 3 Asian or Asian British - Pakistani       1
 4 Black or Black British - African         5
 5 Black or Black British - Caribbean       2
 6 Chinese                                  3
 7 Mixed - White and Asian                  2
 8 Mixed - White and Black African          1
 9 Mixed - White and Black Caribbean        1
10 Not known/ prefer not to say             1
11 White British                           27
12 White Irish                              1
13 White Other                              5

And for gender:

# A tibble: 2 × 2
  Gender  freq
  <chr>  <int>
1 Female    36
2 Male      21

NB: some columns also contain data on postcode & name which I obviously don't want to perform the frequency function on, so I think I'll somehow need to select just the columns I want to perform this function on

1

There are 1 best solutions below

0
stefan On

One option would be to use lapply to loop over a vector of your desired columns and dplyr::count for the frequency table.

Using the starwars dataset as example data:

library(dplyr, warn = FALSE)

cols <- c("hair_color", "sex")

lapply(cols, function(x) {
  count(starwars, .data[[x]], name = "freq")
})
#> [[1]]
#> # A tibble: 13 × 2
#>    hair_color     freq
#>    <chr>         <int>
#>  1 auburn            1
#>  2 auburn, grey      1
#>  3 auburn, white     1
#>  4 black            13
#>  5 blond             3
#>  6 blonde            1
#>  7 brown            18
#>  8 brown, grey       1
#>  9 grey              1
#> 10 none             37
#> 11 unknown           1
#> 12 white             4
#> 13 <NA>              5
#> 
#> [[2]]
#> # A tibble: 5 × 2
#>   sex             freq
#>   <chr>          <int>
#> 1 female            16
#> 2 hermaphroditic     1
#> 3 male              60
#> 4 none               6
#> 5 <NA>               4