How to find all possible valid paths on a graph in R?

210 Views Asked by At

I'm working on an interesting problem in R (maybe using igraph and/or tidygraph libraries), where I need to find ALL POSSIBLE paths on a graph, which satisfy certain criteria. The problem can be simplified to the following:

I have 16 distinct nodes that can be separated into 4 sets, and where each node has a single characteristic, call it a color. *Note: Below might not be the best way to represent the data, but hopefully it communicates the situation.

nodes_set_1 <- c("red", "blue", "orange")
nodes_set_2 <- c("green", "blue", "red", "yellow", "purple")
nodes_set_3 <- c("blue", "green", "red", "orange", "purple")
nodes_set_4 <- c("orange", "blue", "green")

I now need to find ALL POSSIBLE paths between these nodes that satisfy the following three conditions: (1) Each path must contain exactly one node from each set. (2) The graph is directed from nodes_set_1 to nodes_set_2 to nodes_set_3 to nodes_set_4 (3) No color can be repeated within a single path.

So for example the following path would be valid:

path_1 <- c(nodes_set_1[1], nodes_set_2[1], nodes_set_3[1], nodes_set_4[1])

And the path below would be invalid because the color "blue" is repeated:

path_2 <- c(nodes_set_1[2], nodes_set_2[2], nodes_set_3[2], nodes_set_4[2])

I would love some advice on setting up this problem and solving. It would also be amazing to find a way to efficiently determine if no valid solution exists.

Thank you!

3

There are 3 best solutions below

1
ThomasIsCoding On

Base R options

I don't think you really need igraph or tidygraph to find all possible paths, and base R should be sufficient to make it. Below are two options:

- expand.grid + Filter

Using expand.grid to generate all combinations and then subset it based on criteria

nodes_lst <- list(nodes_set_1, nodes_set_2, nodes_set_3, nodes_set_4)
ps <- Filter(Negate(anyDuplicated), asplit(unname(expand.grid(nodes_lst)), 1))

and you will see

> ps
[[1]]
[1] "red"    "green"  "blue"   "orange"

[[2]]
[1] "red"    "yellow" "blue"   "orange"

[[3]]
[1] "red"    "purple" "blue"   "orange"

[[4]]
[1] "red"    "blue"   "green"  "orange"

[[5]]
[1] "blue"   "red"    "green"  "orange"

[[6]]
[1] "red"    "yellow" "green"  "orange"

[[7]]
[1] "blue"   "yellow" "green"  "orange"

[[8]]
[1] "red"    "purple" "green"  "orange"

[[9]]
[1] "blue"   "purple" "green"  "orange"

[[10]]
[1] "blue"   "green"  "red"    "orange"

[[11]]
[1] "blue"   "yellow" "red"    "orange"

[[12]]
[1] "blue"   "purple" "red"    "orange"

[[13]]
[1] "red"    "green"  "purple" "orange"

[[14]]
[1] "blue"   "green"  "purple" "orange"

[[15]]
[1] "red"    "blue"   "purple" "orange"

[[16]]
[1] "blue"   "red"    "purple" "orange"

[[17]]
[1] "red"    "yellow" "purple" "orange"

[[18]]
[1] "blue"   "yellow" "purple" "orange"

[[19]]
[1] "orange" "red"    "green"  "blue"  

[[20]]
[1] "red"    "yellow" "green"  "blue"

[[21]]
[1] "orange" "yellow" "green"  "blue"

[[22]]
[1] "red"    "purple" "green"  "blue"

[[23]]
[1] "orange" "purple" "green"  "blue"

[[24]]
[1] "orange" "green"  "red"    "blue"

[[25]]
[1] "orange" "yellow" "red"    "blue"

[[26]]
[1] "orange" "purple" "red"    "blue"

[[27]]
[1] "red"    "green"  "orange" "blue"

[[28]]
[1] "red"    "yellow" "orange" "blue"

[[29]]
[1] "red"    "purple" "orange" "blue"

[[30]]
[1] "red"    "green"  "purple" "blue"

[[31]]
[1] "orange" "green"  "purple" "blue"

[[32]]
[1] "orange" "red"    "purple" "blue"

[[33]]
[1] "red"    "yellow" "purple" "blue"

[[34]]
[1] "orange" "yellow" "purple" "blue"

[[35]]
[1] "orange" "red"    "blue"   "green"

[[36]]
[1] "red"    "yellow" "blue"   "green"

[[37]]
[1] "orange" "yellow" "blue"   "green"

[[38]]
[1] "red"    "purple" "blue"   "green"

[[39]]
[1] "orange" "purple" "blue"   "green"

[[40]]
[1] "orange" "blue"   "red"    "green"

[[41]]
[1] "blue"   "yellow" "red"    "green"

[[42]]
[1] "orange" "yellow" "red"    "green"

[[43]]
[1] "blue"   "purple" "red"    "green"

[[44]]
[1] "orange" "purple" "red"    "green"

[[45]]
[1] "red"    "blue"   "orange" "green" 

[[46]]
[1] "blue"   "red"    "orange" "green"

[[47]]
[1] "red"    "yellow" "orange" "green"

[[48]]
[1] "blue"   "yellow" "orange" "green"

[[49]]
[1] "red"    "purple" "orange" "green"

[[50]]
[1] "blue"   "purple" "orange" "green"

[[51]]
[1] "red"    "blue"   "purple" "green"

[[52]]
[1] "orange" "blue"   "purple" "green"

[[53]]
[1] "blue"   "red"    "purple" "green"

[[54]]
[1] "orange" "red"    "purple" "green"

[[55]]
[1] "red"    "yellow" "purple" "green" 

[[56]]
[1] "blue"   "yellow" "purple" "green"

[[57]]
[1] "orange" "yellow" "purple" "green"

- Recursion (More Efficient)

Probably a more efficient way is using recursion, by defining a custom function, such that all possible duplicates are skipped during the process of generating the paths

nodes_lst <- list(nodes_set_1, nodes_set_2, nodes_set_3, nodes_set_4)

f <- function(k = length(nodes_lst)) {
    if (k == 1) {
        return(as.list(nodes_lst[[k]]))
    }
    p <- nodes_lst[[k]]
    unlist(
        lapply(
            Recall(k - 1),
            \(x) Map(`c`, list(x), p[!p %in% x])
        ),
        recursive = FALSE
    )
}

and you can simply run f() and will obtain

> f()
[[1]]
[1] "red"    "green"  "blue"   "orange"

[[2]]
[1] "red"    "green"  "orange" "blue"

[[3]]
[1] "red"    "green"  "purple" "orange"

[[4]]
[1] "red"    "green"  "purple" "blue"

[[5]]
[1] "red"    "blue"   "green"  "orange"

[[6]]
[1] "red"    "blue"   "orange" "green"

[[7]]
[1] "red"    "blue"   "purple" "orange"

[[8]]
[1] "red"    "blue"   "purple" "green"

[[9]]
[1] "red"    "yellow" "blue"   "orange"

[[10]]
[1] "red"    "yellow" "blue"   "green"

[[11]]
[1] "red"    "yellow" "green"  "orange"

[[12]]
[1] "red"    "yellow" "green"  "blue"

[[13]]
[1] "red"    "yellow" "orange" "blue"

[[14]]
[1] "red"    "yellow" "orange" "green"

[[15]]
[1] "red"    "yellow" "purple" "orange"

[[16]]
[1] "red"    "yellow" "purple" "blue"

[[17]]
[1] "red"    "yellow" "purple" "green"

[[18]]
[1] "red"    "purple" "blue"   "orange"

[[19]]
[1] "red"    "purple" "blue"   "green"

[[20]]
[1] "red"    "purple" "green"  "orange"

[[21]]
[1] "red"    "purple" "green"  "blue"

[[22]]
[1] "red"    "purple" "orange" "blue"

[[23]]
[1] "red"    "purple" "orange" "green"

[[24]]
[1] "blue"   "green"  "red"    "orange"

[[25]]
[1] "blue"   "green"  "purple" "orange"

[[26]]
[1] "blue"   "red"    "green"  "orange"

[[27]]
[1] "blue"   "red"    "orange" "green"

[[28]]
[1] "blue"   "red"    "purple" "orange"

[[29]]
[1] "blue"   "red"    "purple" "green"

[[30]]
[1] "blue"   "yellow" "green"  "orange"

[[31]]
[1] "blue"   "yellow" "red"    "orange"

[[32]]
[1] "blue"   "yellow" "red"    "green"

[[33]]
[1] "blue"   "yellow" "orange" "green"

[[34]]
[1] "blue"   "yellow" "purple" "orange"

[[35]]
[1] "blue"   "yellow" "purple" "green"

[[36]]
[1] "blue"   "purple" "green"  "orange"

[[37]]
[1] "blue"   "purple" "red"    "orange"

[[38]]
[1] "blue"   "purple" "red"    "green"

[[39]]
[1] "blue"   "purple" "orange" "green"

[[40]]
[1] "orange" "green"  "red"    "blue"

[[41]]
[1] "orange" "green"  "purple" "blue"

[[42]]
[1] "orange" "blue"   "red"    "green"

[[43]]
[1] "orange" "blue"   "purple" "green"

[[44]]
[1] "orange" "red"    "blue"   "green" 

[[45]]
[1] "orange" "red"    "green"  "blue"

[[46]]
[1] "orange" "red"    "purple" "blue"

[[47]]
[1] "orange" "red"    "purple" "green"

[[48]]
[1] "orange" "yellow" "blue"   "green"

[[49]]
[1] "orange" "yellow" "green"  "blue"

[[50]]
[1] "orange" "yellow" "red"    "blue"

[[51]]
[1] "orange" "yellow" "red"    "green"

[[52]]
[1] "orange" "yellow" "purple" "blue"

[[53]]
[1] "orange" "yellow" "purple" "green"

[[54]]
[1] "orange" "purple" "blue"   "green"

[[55]]
[1] "orange" "purple" "green"  "blue"

[[56]]
[1] "orange" "purple" "red"    "blue"

[[57]]
[1] "orange" "purple" "red"    "green"

Benchmark

microbenchmark(
    grid = Filter(Negate(anyDuplicated), asplit(unname(expand.grid(nodes_lst)), 1)),
    recur = f(),
    unit = "relative"
)

shows

Unit: relative
  expr      min       lq     mean   median       uq      max neval
  grid 4.203662 4.403358 4.503495 4.518175 4.241935 7.159534   100
 recur 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000   100
1
s_baldur On

Another recursive solution.

foo <- function(lst) {
  helper <- \(k, history) {
    if (k == 0L) return(history)
    lapply(lst[[k]], \(n) if (!n %in% history) helper(k - 1L, c(n, history)))
  }
  helper(k = length(lst), integer(0L)) |> unlist() |> matrix(ncol = length(lst), byrow = TRUE)
}

nodes_lst <- list(nodes_set_1, nodes_set_2, nodes_set_3, nodes_set_4)

foo(nodes_lst)

We can work on the original data but it should be more efficient to work with integers than strings (and easy to convert back after the fact).

bar <- \() {
  tbl <- unique(unlist(nodes_lst))
  \(n) match(n, tbl)
}

nodes_lst_int <- lapply(nodes_lst, bar())
foo(nodes_lst_int)
1
clp On

Data

nodes_set_1 <- c("red", "blue", "orange")
nodes_set_2 <- c("green", "blue", "red", "yellow", "purple")
nodes_set_3 <- c("blue", "green", "red", "orange", "purple")
nodes_set_4 <- c("orange", "blue", "green")
nodes_lst   <- list(nodes_set_1, nodes_set_2, nodes_set_3, nodes_set_4)

Expand.grid
The following slightly faster alternatives can be considered as alternatives to asplit.

Filter(Negate(anyDuplicated), asplit(unname(expand.grid(nodes_lst)), 1))

library(microbenchmark)
microbenchmark(
    asplit   = Filter(Negate(anyDuplicated), asplit(unname(expand.grid(nodes_lst)), 1)),
    extract  = {grd <- as.matrix(expand.grid(nodes_lst)); grd[which(apply(grd, 1, anyDuplicated) == 0),] },
    subset   = {grd <- as.matrix(expand.grid(nodes_lst)); subset(grd, apply(grd, 1, anyDuplicated) == 0) },
    pipe     = {as.matrix(expand.grid(nodes_lst)) |>  {\(x) subset(x, apply(x, 1,  anyDuplicated) == 0)}() },
    unit     = "relative"
)
# Unit: relative
#     expr      min        lq     mean   median       uq      max neval
#     grid 1.648139 1.6357356 1.670698 1.637040 1.649674 1.606381   100
#  extract 1.003619 0.9992418 1.036993 1.000293 1.002364 1.016782   100
#   subset 1.000000 1.0000000 1.000000 1.000000 1.000000 1.000000   100
#     pipe 1.001724 0.9998736 1.008189 1.003053 1.005102 1.143216   100

Iterative version
Derived from f(), @ThomasisCoding.

f_iter <- function(lst) {
  ps <- lst[[1]]
  if (length(lst) > 1)
    for(i in 2:length(lst) ) {
      nset <- lst[[i]]
      unlist(
          lapply(
              ps,
              \(x) Map(`c`, list(x), nset[!nset %in% x])
          ),
          recursive = FALSE
      ) -> ps
    }
  return(ps)
}
# f_iter(nodes_lst)

Nested for-loops

f_for <- function(lst) {
  ps <- list();
  for (d1 in lst[[1]] )
    for (d2 in lst[[2]] )
      for (d3 in lst[[3]] )
         for (d4 in lst[[4]] ) {
             rrr <- c(d1, d2, d3, d4)
             if (anyDuplicated(rrr) == 0L) { ps[[length(ps) + 1]] <- rrr }
          }         
return(ps)
}          
ps <- f_for(nodes_lst)

library(microbenchmark)
microbenchmark(
  f_iter = f_iter(nodes_lst),
  f_for  = f_for(nodes_lst),
  unit = "relative"
)
# Unit: relative
#    expr      min       lq     mean   median       uq       max neval
#  f_iter 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000   100
#   f_for 1.927624 1.916648 1.646533 1.909522 1.870364 0.2904753   100

Update - By induction

  • ps <- V0, base case.
  • ps <- step(ps, Vn), induction step.
istep <- function(ps, nset) {
   unlist(
       lapply(
           ps,
           \(x) Map(`c`, list(x), nset[!nset %in% x])
       ),
       recursive = FALSE
   ) -> ps
}

# Same performance as iteration.
ps <- nodes_lst[[1]]
for (d in tail(nodes_lst, -1) ) ps <- istep(ps, d)