How can I add a counter for each time a row contains a character in a string?

144 Views Asked by At

I have a dataframe that contains mostly characters that I'm matching to a list of patterns. I'm trying to add a column on the end that counts how many times there's a hit to the list in that row. Basically, what I'm going for is

patterns <- c("Yes", "No", "Maybe")
df <- data.frame (first_column  = c("Why", "Sure", "But", ...),
                  second_column = c("Yes", "Okay", "If Only" ...),
                  third_column = c("No", "When", "Maybe so" ...),
                  fourth_column = c("But", "I won't", "Truth" ...)
                  )

and after running the code, down a fifth column labeled "counter", you'd see 2, 0, 1,... Right now I'm accomplishing this with a pair of nested for loops, with an if statement inside. It works on toy datasets, but I think it will break if I try it on the full size data. Is there a better way using dplyr, grepl, or lapply? My instinct says dplyr, but I'm not sure how to do it. My code is below:

filename = choose.files(caption='Select File')
cases = read.csv(filename)
cases = cbind(cases, counter=0)
l = nrow(cases)
col = ncol(cases)
for (i in 1:l){
  for (j in 1:col){
    if(cases[i,j] %in% patterns)
    {
      cases$counter[i]=cases$counter[i]+1
      }
    }
  
}
3

There are 3 best solutions below

0
On BEST ANSWER

Try this

df$counter <- rowSums(vapply(df, function(x, p) grepl(p, x), integer(nrow(df)), paste0(patterns, collapse = "|")))

Output

> df
  first_column second_column third_column fourth_column counter
1          Why           Yes           No           But       2
2         Sure          Okay         When       I won't       0
3          But       If Only     Maybe so         Truth       1
0
On

We could do this in columnwise with map and reduce

library(dplyr)
library(purrr)
df %>% 
  mutate(counter = map(patterns, ~ rowSums(cur_data() == .x)) %>% 
                                reduce(`+`))
#  first_column second_column third_column fourth_column counter
#1          Why           Yes           No           But       2
#2         Sure          Okay         When       I won't       0
#3          But       If Only     Maybe so         Truth       0

data

df <- structure(list(first_column = c("Why", "Sure", "But"), 
    second_column = c("Yes", 
"Okay", "If Only"), third_column = c("No", "When", "Maybe so"
), fourth_column = c("But", "I won't", "Truth")), class = "data.frame",
row.names = c(NA, 
-3L))
0
On

In dplyr we can use rowwise with c_across :

library(dplyr)
df %>% rowwise() %>% mutate(counter = sum(c_across() %in% patterns))

#  first_column second_column third_column fourth_column counter
#  <chr>        <chr>         <chr>        <chr>           <int>
#1 Why          Yes           No           But                 2
#2 Sure         Okay          When         I won't             0
#3 But          If Only       Maybe so     Truth               0