how to determine the pareto optimum solutions of a 6D matrix in R?

64 Views Asked by At

i have a matrix in r with 128 rows (scenarios) and 6 columns (results for each scenario). i want to determine the pareto optimal scenarios among the 128 possibilities. the results of column 1,5 and 6 should be low. the results of column 2,3 and 4 should be high. i normalized the results for common scales. result_matrix is the matrix of simulated reuslts. i tried this code:

# Example matrix with values between 0 and 1 (not the simulated results I am analysing)
num_rows <- 128
num_cols <- 6

results_matrix <- matrix(runif(num_rows * num_cols), nrow = num_rows, ncol = num_cols)

# Initialize pareto as all points
pareto <- 1:nrow(results_matrix)

# Loop through the points and check for dominance
for (i in 1:nrow(results_matrix)) {
  for (n in 1:nrow(results_matrix)) {
    if (i != n) {
      # Check for dominance
      if (all(results_matrix[i, c(2,3,4)] >= results_matrix[n, c(2,3,4)]) &&
          all(results_matrix[i, c(1,5,6)] <= results_matrix[n, c(1,5,6)])) {
            pareto[i] <- NA
            break  # Break the inner loop when dominance is found
          }
    }
  }
}

# Print the indices of points on the Pareto front
print(pareto)

the output would be a vector with the pareto otimum indices. however,the results are not reasonable. e.g., one pareto option has the highest value in column one, although this should be low.

If I apply the rPref package, I get the pareto solution with the maximum or minimum values for each column, but I want also the solutions on the pareto front:

p <- low(column1, results_df)*high(results_df$column2)* high(results_df$column3)*high(results_df$column4)*    low(results_df$column5)*low(results_df$column6)

peval(p)
1

There are 1 best solutions below

0
On

The inner if is checking that row i dominates row n. It should be the other way around.

pareto1 <- function(m) {
  pareto <- 1:nrow(results_matrix)
  
  # Loop through the points and check for dominance
  for (i in 1:nrow(m)) {
    for (n in 1:nrow(m)) {
      if (i != n) {
        # Check for dominance
        if (all(m[i, c(2,3,4)] <= m[n, c(2,3,4)]) &&
            all(m[i, c(1,5,6)] >= m[n, c(1,5,6)])) {
          pareto[i] <- NA
          break  # Break the inner loop when dominance is found
        }
      }
    }
  }
  
  pareto[!is.na(pareto)]
}

We can also simplify things by flipping the signs of columns 1, 5, and 6. This will allow us to compare the entire row all at once.

pareto2 <- function(m) {
  m[,c(1, 5:6)] <- -m[,c(1, 5:6)]
  pareto <- 1:nrow(m)
  
  for (i in 1:nrow(m)) {
    for (j in (1:nrow(m))[-i]) {
      if (all(m[j,] >= m[i,])) {
        pareto[i] <- NA
        break
      }
    }
  }
  
  pareto[!is.na(pareto)]
}

Or a fully vectorized version:

pareto3 <- function(m) {
  m[,c(1, 5:6)] <- -m[,c(1, 5:6)]
  n <- nrow(m)
  (1:n)[-which(rowSums(m[sequence(rep(n, n - 1L), 1:n)%%n + 1L,] >= m[rep(1:n, each = n - 1L),]) == ncol(m))%/%(n - 1L)]
}

Timing with microbenchmark (and check that the results are equivalent with check = "equal"):

num_rows <- 128
num_cols <- 6
results_matrix <- matrix(runif(num_rows*num_cols), num_rows, num_cols)

microbenchmark::microbenchmark(
  pareto1 = pareto1(results_matrix),
  pareto2 = pareto2(results_matrix),
  pareto3 = pareto3(results_matrix),
  check = "equal"
)
#> Unit: microseconds
#>     expr     min       lq      mean   median       uq     max neval
#>  pareto1 13488.8 14570.55 16589.462 15147.45 16160.35 32005.4   100
#>  pareto2  9216.1  9965.85 11997.814 10694.85 12443.45 24896.5   100
#>  pareto3   779.3   854.30  1129.401  1071.30  1286.90  2413.1   100