filter columns for multiple strings

949 Views Asked by At

I have a dataset which has this structure

df
    row      string
    1        apple, banana, orange, melon, strawberry
    2        blackberry, banana
    3        strawberry, melon, pineapple, apple
    4        orange, pineapple, orange
    5        coconut, apple, orange, melon

I would like to filter rows with multiple variables. The target variables shall be: strawberry or banana or apple. I want to get back all rows that contain at least one of the target variables.

I tried to solve it with grepl. The idea, I took from https://www.statology.org/filter-rows-that-contain-string-dplyr/

So, what I have tried so far

fruit1 <- c("strawberry | banana | apple")

df1 <- filter(df, grepl(fruit1, string))

However, it does not seem to work as expected.

1

There are 1 best solutions below

0
akrun On

We can use str_detect

library(dplyr)
library(stringr)
df %>% 
    filter(str_detect(string, "strawberry|banana|apple"))
  row                                   string
1   1 apple, banana, orange, melon, strawberry
2   2                       blackberry, banana
3   3      strawberry, melon, pineapple, apple
4   4                orange, pineapple, orange
5   5            coconut, apple, orange, melon