require('dplyr')
set.seed(8)
df <- data.frame(v1=rnorm(5),
v2=rnorm(5),
v3=rnorm(5))
If I which to count the number of values above, say, 0 and put this in an new column I would do:
mutate(df, n=apply(df,1,function(x)sum(x>0)))
This would give:
v1 v2 v3 n
1 -0.08458607 -0.1078814 -0.75979380 0
2 0.84040013 -0.1702891 0.29204986 2
3 -0.46348277 -1.0883317 0.42139859 1
4 -0.55083500 -3.0110517 -1.29448908 0
5 0.73604043 -0.5931743 0.06928509 2
Now I want to use dplyr
with chaining and make the same thing on a subset of columns,v1
and v2
, but cannot figure out how to give apply the right data. If I just do (after making df
again of cause):
df %>%
select(v1, v2) %>%
mutate(n=apply(df,1,function(x)sum(x>0)))
...Gives the same as above (same n
i.e. it counts across all three columns), while passing data with .
or just blank: Does not work.
df %>%
select(v1, v2) %>%
mutate(n=apply(.,1,function(x)sum(x>0)))
or:
df %>%
select(v1, v2) %>%
mutate(n=apply(1,function(x)sum(x>0)))
Whats wrong?
After we use
select
for subsetting the columns that are needed, apply therowwise()
function and then usedo
. Here.
refers to the dataframe that we got after theselect
step. When we dosum(.>0)
, it will apply that function on each row of the new dataset. Lastly, wedata.frame(., n=..)
, gets all the previous columns along with the newly createdn
.