Rowwise Column Count in Dataframe

270 Views Asked by At

Let's say I have the following dataframe

country_df <- tibble(
  population = c(328, 38, 30, 56, 1393, 126, 57),
  population2 = c(133, 12, 99, 83, 1033, 101, 33),
  population3 = c(89, 39, 33, 56, 193, 126, 58),
  pop = 45
)

All I need is a concise way inside the mutate function to get the number of columns (population to population3) that are greater than the value of the pop column within each row.

So what I need is the following results (more specifically the GreaterTotal column) Note: I can get the answer by working through each column but it would take a while with more columns)

  population population2 population3   pop GreaterThan0 GreaterThan1 GreaterThan2 GreaterTotal
       <dbl>       <dbl>       <dbl> <dbl> <lgl>        <lgl>        <lgl>               <int>
1        328         133          89    45 TRUE         TRUE         TRUE                    3
2         38          12          39    45 FALSE        FALSE        FALSE                   0
3         30          99          33    45 FALSE        TRUE         FALSE                   1
4         56          83          56    45 TRUE         TRUE         TRUE                    3
5       1393        1033         193    45 TRUE         TRUE         TRUE                    3
6        126         101         126    45 TRUE         TRUE         TRUE                    3
7         57          33          58    45 TRUE         FALSE        TRUE                    2

I've tried using apply with the row index, but I can't get at it. Can somebody please point me in the right direction?

2

There are 2 best solutions below

0
On BEST ANSWER

You can select the 'Population' columns and compare those column with pop and use rowSums to count how many of them are greater in each row.

cols <- grep('population', names(country_df))
country_df$GreaterTotal <- rowSums(country_df[cols] > country_df$pop)

#  population population2 population3   pop GreaterTotal
#       <dbl>       <dbl>       <dbl> <dbl>        <dbl>
#1        328         133          89    45            3
#2         38          12          39    45            0
#3         30          99          33    45            1
#4         56          83          56    45            3
#5       1393        1033         193    45            3
#6        126         101         126    45            3
#7         57          33          58    45            2

In dplyr 1.0.0, you can do this with rowwise and c_across :

country_df %>%
  rowwise() %>%
  mutate(GreaterTotal = sum(c_across(population:population3) > pop))
0
On

Using tidyverse, we can do

library(dplyr)
country_df %>%
      mutate(GreaterTotal = rowSums(select(., 
              starts_with('population')) > .$pop) )

-output

# A tibble: 7 x 5
#  population population2 population3   pop GreaterTotal
#       <dbl>       <dbl>       <dbl> <dbl>        <dbl>
#1        328         133          89    45            3
#2         38          12          39    45            0
#3         30          99          33    45            1
#4         56          83          56    45            3
#5       1393        1033         193    45            3
#6        126         101         126    45            3
#7         57          33          58    45            2