Is it possible to order (dplyr arrange?) a skim_df object by mean?

Question

Is it possible to order (dplyr arrange?) a skim_df object by mean?

219 Views Asked by Jonathan Rauscher At 25 October 2019 at 14:55

I am using the package skimr to summarize data that are all logicals, so naturally I would like to order the result by the mean from largest to smallest.

I have already attempted to pipe the skim function to arrange for dplyr but that didn't work.

We are simply using the skim function on a data frame that are all booleans/logicals.

Original Q&A

There are 2 best solutions below

Elin On 14 November 2019 at 05:53

Here's an answer for v2. In v2 the skim object is no longer the long object. Here select() turns the skim object into a regular tibble (focus()) would have kept it as a skimr object).

skim(df) %>% dplyr::select(skim_variable, logical.mean) %>% 
             dplyr::arrange(desc(logical.mean)) 
# A tibble: 3 x 2
  skim_variable logical.mean
  <chr>                <dbl>
1 c                     0.7 
2 a                     0.6 
3 b                     0.34

Alternatively

skim(df) %>% skimr::focus(skim_variable, logical.mean) %>% 
             dplyr::arrange(desc(logical.mean)) %>% as.data.frame()

  skim_type skim_variable logical.mean
1   logical             c         0.70
2   logical             a         0.60
3   logical             b         0.34

leaves the two meta columns in place. The as.data.frame() is one way to keep the summary from printing but you can also tell it to print with the summary excluded.

skim(df) %>% skimr::focus(skim_variable, logical.mean) %>% 
             dplyr::arrange(desc(logical.mean)) %>% 
             print(include_summary = FALSE)

── Variable type: logical ────────────────────────────────────────────────────────────────
  skim_variable  mean
1 c              0.7 
2 a              0.6 
3 b              0.34

**slava-kohut** · Accepted Answer · 2019-10-25T15:54:48.867000

I tried that, and seems like everything works as intended. skim_df inherits from data.frame, I don't see why dplyr functions will not work on it.

set.seed(123)
df <- data.frame(a = sample(c(T,F), 50, replace = TRUE),
              b = c(rep(F,25), sample(c(T,F), 25, replace = TRUE)),
              c = c(rep(T,25), sample(c(T,F), 25, replace = TRUE)))

sdf <- skimr::skim(df) %>%
     dplyr::filter(stat == "mean") %>% dplyr::arrange(desc(value))

sdf

Output

variable type    stat  level value formatted
  <chr>    <chr>   <chr> <chr> <dbl> <chr>    
1 c        logical mean  .all   0.8  0.8      
2 a        logical mean  .all   0.5  0.5      
3 b        logical mean  .all   0.26 0.26

I don't know what your problem is. Carefully check your code for obvious errors.

Is it possible to order (dplyr arrange?) a skim_df object by mean?

There are 2 best solutions below

Related Questions in R

Related Questions in SKIMR

Trending Questions

Popular # Hahtags

Popular Questions