How to order partly inside a data frame according to different columns and values?

73 Views Asked by At

I would like to order rows within a data frame according to different columns and I don't know how I could achieve this. Here an example to clarify:

x <- data.frame(X=c(10:1), Y=c(6,6,3,6,3,3,9,9,9,2), L=c("A","B","C","C","A","B","C","A","B","C"))
x 
    X Y L 
1  10 6 A
2   9 6 B
3   8 3 C
4   7 6 C
5   6 3 A
6   5 3 B
7   4 9 C
8   3 9 A
9   2 9 B
10  1 2 C

to

    X Y L
1   3 9 A
2   6 3 A
3  10 6 A
4   9 6 B
5   5 3 B
6   2 9 B
7   1 2 C
8   8 3 C
9   7 6 C
10  4 9 C

In this example, the values that are "A" are ordered by increasing "X". The values that have "B" are ordered by decreasing "X" and values with "C" are ordered by increasing "Y".

Can this be done with order() or %>% arrange? (data frame is large)

Thanks!

4

There are 4 best solutions below

1
On

You can create a new vector with the conditions which is used by order.

y <- x$X
i <- x$L == "B"
y[i] <- y[i] * -1
i <- x$L == "C"
y[i] <- x$Y[i]
x[order(x$L, y),]
#    X Y L
#8   3 9 A
#5   6 3 A
#1  10 6 A
#2   9 6 B
#6   5 3 B
#9   2 9 B
#10  1 2 C
#3   8 3 C
#4   7 6 C
#7   4 9 C
0
On

I would write a function which does the job for you. With order_pattern you can specify on which columns you want to order and whether it should be increasing or decreasing

order_pattern <- list(A = c("X", "inc"), B = c("X", "dec"), C = c("Y", "inc"))
 
order_partly <- function(dat, ord_pat){
  result <- dat[0,]
  for (pattern_col in names(ord_pat)){
    order_col <- ord_pat[[pattern_col]][1]
    decreasing <- if (ord_pat[[pattern_col]][2] == "dec") T else F
    partial_dat <- dat[dat$L == pattern_col,]
    ord <- order(partial_dat[order_col], decreasing = decreasing)
    result <- rbind(result, partial_dat[ord, ])
  }
  result
}

order_partly(x, order_pattern)

    X Y L
8   3 9 A
5   6 3 A
1  10 6 A
2   9 6 B
6   5 3 B
9   2 9 B
10  1 2 C
3   8 3 C
4   7 6 C
7   4 9 C
0
On

One option is to create a new column and fill it with values corresponding to your ordering scheme, sort by that column, then remove the column (decorate-sort-undecorate: https://en.wikipedia.org/wiki/Schwartzian_transform)

library(tidyverse)
x <- data.frame(X=c(10:1), Y=c(6,6,3,6,3,3,9,9,9,2), L=c("A","B","C","C","A","B","C","A","B","C"))
x %>%
  group_by(L) %>%
  mutate(shwartz = ifelse(L == "A", X,
                          ifelse(L == "B", 1 / X,
                                 ifelse(L == "C", Y,
                                        "error")))) %>% 
  arrange(L, shwartz) %>% 
  select(-shwartz) %>% 
  ungroup()
# A tibble: 10 x 3
#       X     Y L    
#   <int> <dbl> <chr>
# 1     3     9 A    
# 2     6     3 A    
# 3    10     6 A    
# 4     9     6 B    
# 5     5     3 B    
# 6     2     9 B    
# 7     1     2 C    
# 8     8     3 C    
# 9     7     6 C    
#10     4     9 C   
0
On

We can also use the following solution:

library(dplyr)
library(purrr)

df %>%
  group_split(L) %>%
  map_dfr(~ if(.x$L[1] == "A") {
    .x %>% arrange(.x$X)
  } else if(.x$L[1] == "B") {
    .x %>% arrange(desc(.x$X))
  } else {
    .x %>% arrange(.x$Y)
  })

# A tibble: 10 x 3
       X     Y L    
   <int> <dbl> <chr>
 1     3     9 A    
 2     6     3 A    
 3    10     6 A    
 4     9     6 B    
 5     5     3 B    
 6     2     9 B    
 7     1     2 C    
 8     8     3 C    
 9     7     6 C    
10     4     9 C