Conditionally apply a function with values over a certain value

284 Views Asked by At

I'm sure there is an easy solution to this but I cannot seem to output the correct values. I have a dataframe and I would like to calculate an average based on values above a certain value, in this case 150.

df1 <- as.data.frame(matrix(sample(0:1000, 36*10, replace=TRUE), ncol=1))
df2 <- as.data.frame(matrix(sample(0:500, 36*10, replace=TRUE), ncol=1))
df3 <- as.data.frame(matrix(sample(0:200, 36*10, replace=TRUE), ncol=1))
Example <- cbind(df1,df2,df3)

Similar stuff I've done leads me to think apply may be the most effective way (and I have tried to follow steps from the link below). http://rforpublichealth.blogspot.co.uk/2012/09/the-infamous-apply-function.html. However, the outputs from the following code are faulty, with outputs being below 1 in spite of me trying to mean average values above 150.

test<- apply(Example,2,function(x) {mean(x > 150)})

Any help would be highly appreciated thank you!

3

There are 3 best solutions below

1
On BEST ANSWER

You were close, but need to do mean(x[x > 150]) rather than mean(x > 150):

test<- apply(Example,2,function(x) {mean(x[x > 150])})

This works because x[x > 150] says "take all values of x where x is above 150".

1
On

For the mean of all the values.

mean(as.matrix(Example)[as.matrix(Example) > 150])
[1] 426.0402

By column

sapply(Example, function(x) mean(x[x > 150]))
      V1       V1       V1 
575.6926 332.9713 175.6809 
1
On

A faster option is to use matrix subsetting to select the right values before computing mean by column :

ids <- which(Example>150,arr.ind=T)  ## first all right values are selected 
sapply(seq_len(ncol(Example)),       ## compute mean for each column
        function(x)mean(Example[ids[ids[,2]==x,]]))