How to assign multiple columns to data.frame without repeating function call

3.1k Views Asked by At

Why doesn't this work for an example? There's same value in each row and warning as well

data <- data.frame(id = 1:10)
slowCall <- function(id) data.frame(b = rep(id, 3), c = runif(3))
data[,c("d", "e")] <- sapply(data$id, function(id) {
 tmp <- slowCall(id)
 list(sum(tmp$b), min(tmp$c))
})

Warning message:
In `[<-.data.frame`(`*tmp*`, , c("d", "e"), value = list(3L, 0.104784948984161,  :
 provided 20 variables to replace 2 variables
print(data)
   id d         e
1   1 3 0.1047849
2   2 3 0.1047849
3   3 3 0.1047849
4   4 3 0.1047849
5   5 3 0.1047849
6   6 3 0.1047849
7   7 3 0.1047849
8   8 3 0.1047849
9   9 3 0.1047849
10 10 3 0.1047849
3

There are 3 best solutions below

2
Thomas On BEST ANSWER

You could try something like this. First, vectorize the assign function (per @Joran's answer here), then modify your code slightly.

# vectorize
assignVec <- Vectorize("assign",c("x","value"))

library(plyr)
set.seed(1) # this is just here for reproducibility

data <- data.frame(id = 1:10)
slowCall <- function(id) data.frame(b = rep(id, 3), c = runif(3))

# I store this as `tmp` just to make the code below look cleaner
tmp <- mlply(sapply(data$id, function(id) {
    tmp <- slowCall(id)
    list(sum(tmp$b), min(tmp$c))
}), c)

# here's the key part:
data <- within(data, assignVec(c('d','e'), tmp, envir=environment()))

Output:

> data
   id          e  d
1   1 0.26550866  3
2   2 0.20168193  6
3   3 0.62911404  9
4   4 0.06178627 12
5   5 0.38410372 15
6   6 0.49769924 18
7   7 0.38003518 21
8   8 0.12555510 24
9   9 0.01339033 27
10 10 0.34034900 30

Note: I invoke plyr::mlply to get your sapply output into a list.

The simpler answer, though, is to change the righthand side of your operation into:

data[,c("d", "e")] <- as.data.frame(t(sapply(data$id, function(id) {
 tmp <- slowCall(id)
 list(sum(tmp$b), min(tmp$c))
})))

which would give you the same result.

1
shadow On

The problem here is that the matrix returned by your sapply contains one-element lists instead of numeric values. Change your list to a c and transpose the output, then it will work.

data[, c("d", "e")] <- t(sapply(data$id, function(id) {
  tmp <- slowCall(id)
  c(sum(tmp$b), min(tmp$c))
}))
0
Blue Magister On

Here's a generic method to add two columns of different data types (e.g. character and numeric). It uses lists and transposes lists (via this answer).

Here, this answer would preserve the integer and numeric types of the two outputs.

rowwise <- lapply(data$id, function(id) {
  tmp <- slowCall(id)
  list(sum(tmp$b), min(tmp$c))
})
colwise <- lapply(seq_along(rowwise[[1]]), function(i) lapply(rowwise, "[[", i))

data[,c("d", "e")] <- colwise