How would I get the position of the first occurrence of a value 0 in a number of binary matrices in R?

321 Views Asked by At

I am trying to get the position of the first occurrence of a value 0 in a number of binary matrices read in through a number of csv files.

I have got the the number of 0s using...

sapply(files_to_use, function(x) sum(x == 0))

After reading in all csv files using...

reading_in_csv <- list.files(pattern="*.csv")
files_to_use <- lapply(reading_in_csv, read.delim)

I have tried the following code but get the error 'dim(X) must have a positive length'...

find_first_0 <- function(x){which(x = 0)}
apply(files,1,find_first_0)

Would anyone have any insight on the above. I was thinking of the function which() to get the position but I have no understanding with how to implement it with a number of matrices at once.

Given example matrix...

dimMat <- matrix(0, 1000, 10)

for(i in 1:1000){
  dimMat[i, ] <- sample(c(0,1), 10, replace = TRUE, prob = c(.3, .7))
}

print(dimMat)
2

There are 2 best solutions below

13
On

It is ugly but i think this is what you are after:

delete_empty_matrices  <-  function(matrix_list){   
  matrix_list[unlist(lapply(matrix_list, length) != 0)]
}

files_to_use <- files_to_use[!(is.na(delete_empty_matrices(files_to_use)))]

sapply(files_to_use, function(x){apply(x, 1, function(y){ifelse(length(y) > 0,
                                                                suppressWarnings(min(which(y == 0))), NA)})})
0
On

Here are a couple of ways to get the row and column indices of the first record per row which is 0.

aggregate(col ~ row,
          data = which(dimMat == 0, arr.ind = T),
          FUN = function(x) x[1])

complete_rows <- rowSums(dimMat) < ncol(dimMat)

cbind(row = seq_len(nrow(dimMat))[complete_rows],
      col = apply(dimMat == 0, 1, which.max)[complete_rows])

To find the first record per column which is 0 it would be very similar:

aggregate(row ~ col,
          data = which(dimMat == 0, arr.ind = T),
          FUN = function(x) x[1])

complete_cols <- colSums(dimMat) < nrow(dimMat)

cbind(col = seq_len(ncol(dimMat))[complete_cols],
      row = apply(dimMat == 0, 2, which.max)[complete_cols])