Permutations of 3 elements within 6 positions

379 Views Asked by At

I'm looking to permute (or combine) c("a","b","c") within six positions under the condition to have always sequences with alternate elements, e.g abcbab.

Permutations could easily get with:

abc<-c("a","b","c")
permutations(n=3,r=6,v=abc,repeats.allowed=T)

I think is not possible to do that with gtools, and I've been trying to design a function for that -even though I think it may already exist.

2

There are 2 best solutions below

0
On BEST ANSWER

Since you're looking for permutations, expand.grid can work as well as permutations. But since you don't want like-neighbors, we can shorten the dimensionality of it considerably. I think this is legitimate random-wise!

Up front:

r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
r[[1]] <- c(r[[1]], length(abc))
m <- t(apply(do.call(expand.grid, r), 1, cumsum) %% length(abc) + 1)
m[] <- abc[m]
dim(m)
# [1] 96  6
head(as.data.frame(cbind(m, apply(m, 1, paste, collapse = ""))))
#   Var1 Var2 Var3 Var4 Var5 Var6     V7
# 1    b    c    a    b    c    a bcabca
# 2    c    a    b    c    a    b cabcab
# 3    a    b    c    a    b    c abcabc
# 4    b    a    b    c    a    b babcab
# 5    c    b    c    a    b    c cbcabc
# 6    a    c    a    b    c    a acabca

Walk-through:

  • since you want all recycled permutations of it, we can use gtools::permutations, or we can use expand.grid ... I'll use the latter, I don't know if it's much faster, but it does a short-cut I need (more later)
  • when dealing with constraints like this, I like to expand on the indices of the vector of values
  • however, since we don't want neighbors to be the same, I thought that instead of each row of values being the straight index, we cumsum them; by using this, we can control the ability of the cumulative sum to re-reach the same value ... by removing 0 and length(abc) from the list of possible values, we remove the possibility of (a) never staying the same, and (b) never increasing actually one vector-length (repeating the same value); as a walk-through:

    head(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), n = 6)
    #   Var1 Var2 Var3 Var4 Var5 Var6
    # 1    1    1    1    1    1    1
    # 2    2    1    1    1    1    1
    # 3    3    1    1    1    1    1
    # 4    1    2    1    1    1    1
    # 5    2    2    1    1    1    1
    # 6    3    2    1    1    1    1
    

    Since the first value can be all three values, it's 1:3, but each additional is intended to be 1 or 2 away from it.

    head(t(apply(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), 1, cumsum)), n = 6)
    #      Var1 Var2 Var3 Var4 Var5 Var6
    # [1,]    1    2    3    4    5    6
    # [2,]    2    3    4    5    6    7
    # [3,]    3    4    5    6    7    8
    # [4,]    1    3    4    5    6    7
    # [5,]    2    4    5    6    7    8
    # [6,]    3    5    6    7    8    9
    

    okay, that doesn't seem that useful (since it goes beyond the length of the vector), so we can invoke the modulus operator and a shift (since modulus returns 0-based, we want 1-based):

    head(t(apply(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), 1, cumsum) %% 3 + 1), n = 6)
    #      Var1 Var2 Var3 Var4 Var5 Var6
    # [1,]    2    3    1    2    3    1
    # [2,]    3    1    2    3    1    2
    # [3,]    1    2    3    1    2    3
    # [4,]    2    1    2    3    1    2
    # [5,]    3    2    3    1    2    3
    # [6,]    1    3    1    2    3    1
    
  • To verify this works, we can do a diff across each row and look for 0:

    m <- t(apply(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), 1, cumsum) %% 3 + 1)
    any(apply(m, 1, diff) == 0)
    # [1] FALSE
    
  • to automate this to an arbitrary vector, we enlist the help of replicate to generate the list of possible vectors:

    r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
    r[[1]] <- c(r[[1]], length(abc))
    str(r)
    # List of 6
    #  $ : int [1:3] 1 2 3
    #  $ : int [1:2] 1 2
    #  $ : int [1:2] 1 2
    #  $ : int [1:2] 1 2
    #  $ : int [1:2] 1 2
    #  $ : int [1:2] 1 2
    

    and then do.call to expand it.

  • one you have the matrix of indices,

    head(m)
    #      Var1 Var2 Var3 Var4 Var5 Var6
    # [1,]    2    3    1    2    3    1
    # [2,]    3    1    2    3    1    2
    # [3,]    1    2    3    1    2    3
    # [4,]    2    1    2    3    1    2
    # [5,]    3    2    3    1    2    3
    # [6,]    1    3    1    2    3    1
    

    and then replace each index with the vector's value:

    m[] <- abc[m]
    head(m)
    #      Var1 Var2 Var3 Var4 Var5 Var6
    # [1,] "b"  "c"  "a"  "b"  "c"  "a" 
    # [2,] "c"  "a"  "b"  "c"  "a"  "b" 
    # [3,] "a"  "b"  "c"  "a"  "b"  "c" 
    # [4,] "b"  "a"  "b"  "c"  "a"  "b" 
    # [5,] "c"  "b"  "c"  "a"  "b"  "c" 
    # [6,] "a"  "c"  "a"  "b"  "c"  "a" 
    
  • and then we cbind the united string (via apply and paste)


Performance:

library(microbenchmark)
library(dplyr)
library(tidyr)
library(stringr)

microbenchmark(
  tidy1 = {
    gtools::permutations(n = 3, r = 6, v = abc, repeats.allowed = TRUE) %>% 
      data.frame() %>% 
      unite(united, sep = "", remove = FALSE) %>%
      filter(!str_detect(united, "([a-c])\\1"))
  },
  tidy2 = {
      filter(unite(data.frame(gtools::permutations(n = 3, r = 6, v = abc, repeats.allowed = TRUE)),
                   united, sep = "", remove = FALSE),
             !str_detect(united, "([a-c])\\1"))
  },
  base = {
    r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
    r[[1]] <- c(r[[1]], length(abc))
    m <- t(apply(do.call(expand.grid, r), 1, cumsum) %% length(abc) + 1)
    m[] <- abc[m]
  },
  times=10000
)
# Unit: microseconds
#   expr      min        lq     mean   median       uq       max neval
#  tidy1 1875.400 2028.8510 2446.751 2165.651 2456.051 12790.901 10000
#  tidy2 1745.402 1875.5015 2284.700 2000.051 2278.101 50163.901 10000
#   base  796.701  871.4015 1020.993  919.801 1021.801  7373.901 10000

I tried the infix (non-%>%) tidy2 version just for kicks, and though I was confident it would theoretically be faster, I didn't realize it would shave over 7% off the run-times. (The 50163 is likely R garbage-collecting, not "real".) The price we pay for readability/maintainability.

5
On

There are probably cleaner methods, but here ya go:

abc <- letters[1:3]

library(tidyverse)

res <- gtools::permutations(n = 3, r = 6, v = abc, repeats.allowed = TRUE) %>% 
  data.frame() %>% 
  unite(united, sep = "", remove = FALSE) %>%
  filter(!str_detect(united, "([a-c])\\1"))

head(res)  

  united X1 X2 X3 X4 X5 X6
1 ababab  a  b  a  b  a  b
2 ababac  a  b  a  b  a  c
3 ababca  a  b  a  b  c  a
4 ababcb  a  b  a  b  c  b
5 abacab  a  b  a  c  a  b
6 abacac  a  b  a  c  a  c

If you want a vector, you can use res$united or add %>% pull(united) as an additional step at the end of the pipes above.