to calculate summary of multipl. two column in dataset in R, loops

75 Views Asked by At

I have a large data table with over 300 columns. I would like to get by each letter column

-- summary of (each observation in column * weight of observation).

-- summary of weight if obs. in a letter column is more than 0.

Here I provided a example for a column.

 id <- c("0001", "0002", "0003", "0004")
 a <- c(0, 9, 8, 5)
 b <- c(0,5,5,0)
 c <- c(1.5, 0.55, 0, 0.06)
 weight <- c(102.354, 34.998, 84.664, .657)
 data <- data.frame(id, a, b, c, weight)
 data
   id a b    c  weight
 1 0001 0 0 1.50 102.354
 2 0002 9 5 0.55  34.998
 3 0003 8 5 0.00  84.664
 4 0004 5 0 0.06   0.657
 sum(data$a * data$weight)
[1] 995.579
 sum(data$weight[data$a >0])
[1] 120.319​

Any idea?

2

There are 2 best solutions below

0
On BEST ANSWER

The following code should solve your question:

my.names <- names(data)[names(data) %in% letters]

res <- lapply(my.names, function(x){
  c(sum(data[[x]]*data[["weight"]]), sum(data[["weight"]][data[[x]]>0]))
})

names(res) <- my.names

or directly to data.frame:

do.call("rbind", lapply(my.names, function(letter){
  data.frame(letter, "sum1_name" = sum(data[[letter]]*data[["weight"]]), 
             "sum2_name" = sum(data[["weight"]][data[[letter]]>0]))
}))

# letter sum1_name sum2_name
# 1      a  995.5790   120.319
# 2      b  598.3100   119.662
# 3      c  172.8193   138.009
0
On

A possible data.table solution

You could define an helper function

tempfunc <- function(x) c(sum(x * data$weight), sum(data$weight[x > 0]))

Then do either

library(data.table)
setDT(data)[, lapply(.SD, tempfunc), .SDcols = -c("id", "weight")]
#          a       b        c
# 1: 995.579 598.310 172.8193
# 2: 120.319 119.662 138.0090

Or

library(dplyr)
setDT(data) %>% summarise_each(funs(tempfunc), -c(id, weight))
##          a       b        c
## 1: 995.579 598.310 172.8193
## 2: 120.319 119.662 138.0090