Why do row names disappear after as.matrix?

1.9k Views Asked by At

I notice that if the row names of the dataframe follows a sequence of numbers from 1 to the number of rows. The row names of the dataframe will disappear after using as.matrix. But the row names re-appear if the row name is not a sequence.

Here are a reproducible example:

test <- as.data.frame(list(x=c(0.1, 0.1, 1), y=c(0.1, 0.2, 0.3)))
rownames(test)
# [1] "1" "2" "3"

rownames(as.matrix(test))
# NULL

rownames(as.matrix(test[c(1, 3), ]))
# [1] "1" "3"

Why does this happen?

5

There are 5 best solutions below

3
On BEST ANSWER

First and foremost, we always have a numerical index for sub-setting that won't disappear and that we should not confuse with row names.

as.matrix(test)[c(1, 3), ]
#        x   y
# [1,] 0.1 0.1
# [2,] 1.0 0.3

WHAT's going on while using rownames is the dimnames feature in the serene source code of base:::rownames(),

function (x, do.NULL = TRUE, prefix = "row") 
{
  dn <- dimnames(x)
  if (!is.null(dn[[1L]])) 
    dn[[1L]]
  else {
    nr <- NROW(x)
    if (do.NULL) 
      NULL
    else if (nr > 0L) 
      paste0(prefix, seq_len(nr))
    else character()
  }
}

which yields NULL for dimnames(as.matrix(test))[[1]] but yields "1" "3" in the case of dimnames(as.matrix(test[c(1, 3), ]))[[1]].

Note, that the method base:::row.names.data.frame is applied in case of data frames, e.g. rownames(test).

The WHAT should be explained with it, fortunately you did not ask for the WHY, which would be rather opinion-based.

0
On

You can enable rownames = TRUE when you apply as.matrix

> as.matrix(test, rownames = TRUE)
    x   y
1 0.1 0.1
2 0.1 0.2
3 1.0 0.3
0
On

There is a difference between 'automatic' and non-'automatic' row names.

Here is a motivating example:

automatic

test <- as.data.frame(list(x = c(0.1,0.1,1), y = c(0.1,0.2,0.3)))
rownames(test)
# [1] "1" "2" "3"

rownames(as.matrix(test))
# NULL

non-'automatic'

test1 <- test
rownames(test1) <- as.character(1:3)
rownames(test1)
# [1] "1" "2" "3"

rownames(as.matrix(test1))
# [1] "1" "2" "3"

You can read about this in e.g. ?data.frame, which mentions the behavior you discovered at the end:

If row.names was supplied as NULL or no suitable component was found the row names are the integer sequence starting at one (and such row names are considered to be ‘automatic’, and not preserved by as.matrix).

When you call test[c(1, 3), ] then you create non-'automatic' rownames implicitly, which is kinda documented in ?Extract.data.frame:

If `[` returns a data frame it will have unique (and non-missing) row names.

(type `[.data.frame` into your console if you want to go deeper here.)

Others showed what this means for your case already, see the argument rownames.force in ?matrix:

rownames.force: ... The default, NA, uses NULL rownames if the data frame has ‘automatic’ row.names or for a zero-row data frame.

0
On

I don't know exactly why it happens, but one way to fix it is to include the argument rownames.force = T, inside as.matrix

rownames(as.matrix(test, rownames.force = T))
0
On

The difference dataframe vs. matrix:

?rownames

rownames(x, do.NULL = TRUE, prefix = "row")

The important part is do.NULL = TRUE the default is TRUE: This means:

If do.NULL is FALSE, a character vector (of length NROW(x) or NCOL(x)) is returned in any case,

If the replacement versions are called on a matrix without any existing dimnames, they will add suitable dimnames. But constructions such as

rownames(x)[3] <- "c"

may not work unless x already has dimnames, since this will create a length-3 value from the NULL value of rownames(x).

For me that means (maybe not correct or professional) to apply rownames() function to a matrix the dimensions of the row must be declared before otherwise you will get NULL -> because this is the default setting in the function rownames().

In your example you experience this kind of behaviour: Here you declare row 1 and 3 and get 1 and 3

rownames(as.matrix(test[c(1, 3), ]))
[1] "1" "3"

Here you declare nothing and get NULL because NULL is the default.

rownames(as.matrix(test))
NULL

You can overcome this by declaring before:

rownames(test) <- 1:3

rownames(as.matrix(test))
[1] "1" "2" "3"

or you could do :

rownames(as.matrix(test), do.NULL = FALSE)
[1] "row1" "row2" "row3"
> rownames(as.matrix(test), do.NULL = FALSE, prefix="")
[1] "1" "2" "3"

Similar effect with rownames.force: rownames.force
logical indicating if the resulting matrix should have character (rather than NULL) rownames. The default, NA, uses NULL rownames if the data frame has ‘automatic’ row.names or for a zero-row data frame. dimnames(matrix_test)