Filtering rows of count matrix with `rowSums` gives error 'x' must be numeric

105 Views Asked by At

I have a count matrix in .csv format. The data is structured like this:

Genes cond. 1 cond. 2 cond. 3
Alpha 77 51 98
Beta 0 0 71
Cena 823 856 0

I'm trying to filter out the matrix where the new matrix will have rows with a sum that is above 0.

To filter out the rows whose sum is 0, I wrote the code in this way:

Counts <- Count[which(rowSums(Counts) > 0), ]

but it gave me an error saying:

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for 
  function 'which': 'x' must be numeric

I checked the data to see if there were any NA's but there were none. There are only numeric numbers.

Here is my full code:

Counts <- read.delim("RiboTag_count_matrix_10-05-2023.csv", 
                     header = TRUE, sep=",")

Counts  ## shows the matrix visually

Counts <- Counts[which(rowSums(Counts > 0,]  ## filter out rows with 0's

I'm not sure where in my code it produced an error. Any advice is greatly appreciated. Thank you.

I even tried

## Convert Counts to a matrix
Counts <- as.matrix(Counts)

## Convert them to numeric
Counts <- apply(Counts, 2, as.numeric)

# Check for and handle missing values
if (any(is.na(Counts))) {
  ## Handle missing values (e.g., replace with 0)
  Counts[is.na(Counts)] <- 0
}

to troubleshoot my problem but still gave me same error message.

1

There are 1 best solutions below

0
jay.sf On

Wait, you try to drink the coffee without milk, but buy it with milk first and take it out later.

Use read.delim properly

In documentation ?read.delim further down in section "CSV files" we read,

the commonest form of CSV file with row names needs to be read with read.csv(..., row.names = 1) to use the names in the first column of the file as row names,

which also applies to read.delim. In section just above we learn, the Value we get is a data.frame, and need doing as.matrix() to get a matrix.

So to get the matrix with desired integers and row and column names, do

(Counts <- read.delim('foo.csv', row.names=1) |> as.matrix())
#         cond.1 cond.2 cond.3
# Alpha       77     51     98
# Beta         0      0     71
# Gamma      823    856      0
# Delta        0      0      0
# Epsilon     59     NA      1

class(Counts)
# [1] "matrix" "array" 
typeof(Counts)
# [1] "integer"

and everything is fine.

Note: If your .csv file is actually comma separated, use read.csv instead; your use of sep=',' confuses me a little.

The thing with the which

which "cares" for NAs. (I've added one in the Epsilon row to demonstrate.)

While using which also removes rows with NAs,

Counts[which(rowSums(Counts) > 0), ] 
#       cond.1 cond.2 cond.3
# Alpha     77     51     98
# Beta       0      0     71
# Gamma    823    856      0

not using it, fails.

Counts[rowSums(Counts) > 0, ] 
#       cond.1 cond.2 cond.3
# Alpha     77     51     98
# Beta       0      0     71
# Gamma    823    856      0
# <NA>      NA     NA     NA

However, you may want to decide if you really want to remove these rows, or do something else with them, e.g. impute or check where the NA comes from in the first place.

Appendix

matrixes, unlike data.frames can only contain one typeof data, e.g. either numeric (integer, double) or character. If we had a numeric matrix m,

(m <- matrix(0, 2, 2))
#      [,1] [,2]
# [1,]    0    0
# [2,]    0    0

typeof(m)
# [1] "double"

and change only one element to character,

m[1, 2] <- 'A'
m
#      [,1] [,2]
# [1,] "0"  "A" 
# [2,] "0"  "0" 

we get a character matrix,

typeof(m)
# [1] "character"

which was the case with your Count matrix.


Data:

foo.csv file:

"cond.1"    "cond.2"    "cond.3"
"Alpha" 77  51  98
"Beta"  0   0   71
"Gamma" 823 856 0
"Delta" 0   0   0
"Epsilon"   59  NA  1

or alternatively

"Genes" "cond.1"    "cond.2"    "cond.3"
"Alpha" 77  51  98
"Beta"  0   0   71
"Gamma" 823 856 0
"Delta" 0   0   0
"Epsilon"   59  NA  1