How to use purrr to programatically cat and/or print janitor tabyl output

449 Views Asked by At

Say you're using the tidyverse to nest() a select group of categorical variables:

library(tidyverse)
library(janitor)

nested_df <- mpg %>%
  select(manufacturer, class) %>%
  gather(variable, value) %>%
  group_by(variable) %>%
  nest()

nested_df
# A tibble: 2 x 2
  variable     data              
  <chr>        <list>            
1 manufacturer <tibble [234 x 1]>
2 class        <tibble [234 x 1]>

Now we can add a new column which contains the output from janitor::tabyl:

nested_df %>%
  mutate(
    table_output = map(data, ~ tabyl(.$value))
  )

# A tibble: 2 x 3
  variable     data               table_output    
  <chr>        <list>             <list>          
1 manufacturer <tibble [234 x 1]> <tabyl [15 x 3]>
2 class        <tibble [234 x 1]> <tabyl [7 x 3]> 

Questions:

  1. How can we print or walk through the output to get both the variable name and the table_output?
  2. Is there a better approach (e.g. using split instead of group_by %>% nest?

Something like printing the following...

Variable is: manufacturer

Tabyl Output:

    .$value  n    percent
       audi 18 0.07692308
  chevrolet 19 0.08119658
      dodge 37 0.15811966
       ford 25 0.10683761
 ...more rows...
    mercury  4 0.01709402
     nissan 13 0.05555556
    pontiac  5 0.02136752
     subaru 14 0.05982906
     toyota 34 0.14529915
 volkswagen 27 0.11538462


Variable is: class

Tabyl Output:

    .$value  n    percent
    2seater  5 0.02136752
    compact 47 0.20085470
    midsize 41 0.17521368
    minivan 11 0.04700855
     pickup 33 0.14102564
 subcompact 35 0.14957265
        suv 62 0.26495726
1

There are 1 best solutions below

1
acylam On BEST ANSWER

We can use pwalk, cat, and print. The input to pwalk is a data.frame (list of lists) containing only the variable and table_output columns. Similar to pmap, pwalk walks through each element of both columns simultaneously and are being referenced by .x and .y in the anonymous function. Different from pmap, pwalk executes the code without returning any output. This is useful when we only want the side-effect of the code execution:

library(tidyverse)
library(janitor)

nested_df <- mpg %>%
  select(manufacturer, class) %>%
  gather(variable, value) %>%
  group_by(variable) %>%
  nest()

nested_df %>%
  mutate(
    table_output = map(data, ~ tabyl(.$value))
  ) %>%
  select(-data) %>%
  pwalk(~{
    cat(paste0('Variable is: ', .x, '\n\nTabyl Output: \n\n')) 
    print(.y)
    cat('\n\n')
  })

To print strings, we use cat to avoid the [1] in front. To print the table output, we use print. "\n"s are added to pad blank lines for readability.

Output:

Variable is: manufacturer

Tabyl Output: 

    .$value  n    percent
       audi 18 0.07692308
  chevrolet 19 0.08119658
      dodge 37 0.15811966
       ford 25 0.10683761
      honda  9 0.03846154
    hyundai 14 0.05982906
       jeep  8 0.03418803
 land rover  4 0.01709402
    lincoln  3 0.01282051
    mercury  4 0.01709402
     nissan 13 0.05555556
    pontiac  5 0.02136752
     subaru 14 0.05982906
     toyota 34 0.14529915
 volkswagen 27 0.11538462


Variable is: class

Tabyl Output: 

    .$value  n    percent
    2seater  5 0.02136752
    compact 47 0.20085470
    midsize 41 0.17521368
    minivan 11 0.04700855
     pickup 33 0.14102564
 subcompact 35 0.14957265
        suv 62 0.26495726