This is a bug report, not a question. The procedure to report bugs in R core appears complicated, and I don't want to be part of a mailing list. So I'm posting this here (as recommended by https://www.r-project.org/bugs.html.)
Here it is:
The tapply()
help of R 4.0.3 says the following on argument X
:
an R object for which a split method exists. Typically vector-like, allowing subsetting with [.
Issue: this R object cannot be a data.frame, although a data.frame can be split and subsetted.
To reproduce, run the following:
func <- function(dt) {
sum(dt[,1] * dt[,2])
}
tab <- data.frame(x = sample(100), y = sample(100), z = sample(letters[1:10], 100, T))
tapply(tab[,1:2], INDEX = tab$z, FUN = func)
This results in
error in tapply(tab[, 1:2], INDEX = tab$z, FUN = func) : arguments must have same length
which, upon looking at the tapply()
source code, appears to result from this check:
if (!all(lengths(INDEX) == length(X)))
stop("arguments must have same length")
But length()
is not the relevant function to call on a data.frame to determine if it has the right dimension for a split. nrow()
should be used instead.
replacing the above code with
if(is.data.frame(X)) {
len <- nrow(X)
} else {
len <- length(X)
}
if (!all(lengths(INDEX) == len))
stop("arguments must have same length")
solves the error.
This fix looks rather straightforward, and implementing it would increase the usefulness of tapply()
by a lot (I know there are powerful alternatives to tapply()
), so I wonder if the current limitation reflects a design choice.
Based on the function, we could use
-output
Or using
by
frombase R
According to
?tapply
Here, the
tab[, 1:2]
is a data.frame and not avector
. If it is amatrix
, it would be avector
withdim
attributes