Is it possible to order (dplyr arrange?) a skim_df object by mean?

219 Views Asked by At

I am using the package skimr to summarize data that are all logicals, so naturally I would like to order the result by the mean from largest to smallest.

I have already attempted to pipe the skim function to arrange for dplyr but that didn't work.

We are simply using the skim function on a data frame that are all booleans/logicals.

2

There are 2 best solutions below

2
slava-kohut On BEST ANSWER

I tried that, and seems like everything works as intended. skim_df inherits from data.frame, I don't see why dplyr functions will not work on it.

set.seed(123)
df <- data.frame(a = sample(c(T,F), 50, replace = TRUE),
              b = c(rep(F,25), sample(c(T,F), 25, replace = TRUE)),
              c = c(rep(T,25), sample(c(T,F), 25, replace = TRUE)))

sdf <- skimr::skim(df) %>%
     dplyr::filter(stat == "mean") %>% dplyr::arrange(desc(value))

sdf

Output

variable type    stat  level value formatted
  <chr>    <chr>   <chr> <chr> <dbl> <chr>    
1 c        logical mean  .all   0.8  0.8      
2 a        logical mean  .all   0.5  0.5      
3 b        logical mean  .all   0.26 0.26 

I don't know what your problem is. Carefully check your code for obvious errors.

0
Elin On

Here's an answer for v2. In v2 the skim object is no longer the long object. Here select() turns the skim object into a regular tibble (focus()) would have kept it as a skimr object).

skim(df) %>% dplyr::select(skim_variable, logical.mean) %>% 
             dplyr::arrange(desc(logical.mean)) 
# A tibble: 3 x 2
  skim_variable logical.mean
  <chr>                <dbl>
1 c                     0.7 
2 a                     0.6 
3 b                     0.34

Alternatively

skim(df) %>% skimr::focus(skim_variable, logical.mean) %>% 
             dplyr::arrange(desc(logical.mean)) %>% as.data.frame()

  skim_type skim_variable logical.mean
1   logical             c         0.70
2   logical             a         0.60
3   logical             b         0.34

leaves the two meta columns in place. The as.data.frame() is one way to keep the summary from printing but you can also tell it to print with the summary excluded.

skim(df) %>% skimr::focus(skim_variable, logical.mean) %>% 
             dplyr::arrange(desc(logical.mean)) %>% 
             print(include_summary = FALSE)

── Variable type: logical ────────────────────────────────────────────────────────────────
  skim_variable  mean
1 c              0.7 
2 a              0.6 
3 b              0.34