Text mining between a data frame column and 2 lists in R

115 Views Asked by At

So i created two lists composed of words :

fruits <- c("banana","apple","strawberry")
homemade <- c("kitchen","homemade","mom","dad","sister")

And here is my dataset

description isCake
apple cake cooked by mom YES
pie from the bakery NO
strawberry dessert by dad NO

I want to create a text mining code so that when df$description contains one or multiple words from "fruits" AND one or multiple words from homemade, df$isCake become "OK"

Expected output

description isCake
apple cake cooked by mom YES
pie from the bakery NO
strawberry dessert by dad OK
df <- df %>% mutate(isCake=ifelse(description %in% fruits & description %in% homemade, "OK", isCake))

I have no error messages but apparently is doesn't work because when i subset if isCake=="OK" i always have 0 obs.

1

There are 1 best solutions below

0
On BEST ANSWER

You can create a pattern from fruits and homemade vector and use it in grepl :

df$isCake[grepl(paste0(fruits, collapse = '|'), df$description) & 
          grepl(paste0(homemade, collapse = '|'), df$description)] <- 'OK'

df
#                description isCake
#1  apple cake cooked by mom     OK
#2       pie from the bakery     NO
#3 strawberry dessert by dad     OK