Difference between using str_detect() and contains()?

94 Views Asked by At

I know it might be a silly question, but I was curious if there was any difference, I like more using str_detect because the syntax makes more sense in my brain.

2

There are 2 best solutions below

2
jpsmith On BEST ANSWER

Yes there are substantial differences. First, contains() is a "selection helper" that must be used within a (generally tidyverse) selecting function.

So you cant work with vectors or use contains() as a standalone function - ie, you can't do:

x <- c("Hello", "and", "welcome (example)") 

tidyselect::contains("Hello", x)

Or you get the error:

Error: ! contains() must be used within a selecting function.

Whereas stringr::str_detect can work with vectors and as a standalone function:

stringr::str_detect(x, "Hello")

Returns:

[1]  TRUE FALSE FALSE

Secondly, stringr::str_detect() allows for regex, and tidyselect::contains only looks for literal strings.

So for example, the below works

df <- data.frame(col1 = c("Hello", "and", "welcome (example)"))

df %>% 
  select(contains("1"))

#               col1
# 1             Hello
# 2               and
# 3 welcome (example)

But this does not:

df %>% select(contains("\\d"))

(\\d is the R regex for "any digit")

Additionally, as noted by @abagail, contains looks at column names, not at the values stored within the columns. For instance, df %>% filter(contains("1")) worked above to return the column col1 (since there was a "1" in the column name). But trying to filter on the values that contain a certain pattern does not work:

df %>% 
  filter(contains("Hello"))

Returns the same error:

Caused by error: ! contains() must be used within a selecting function.

But you can filter on the values in the columns using stringr::str_detect():

df %>% 
  filter(stringr::str_detect(col1, "Hello"))

#    col1
# 1 Hello

Lastly, if you are looking for similar functions outside of stringr, since tidyselect::matches() will accept regex, @GregorThomas aptly points out in the comments,

"tidyselect::matches is a much closer analog to str_detect() --though still as a selection helper is is only for use within a selecting function."

str_detect is also equivalent to base R's grepl, though the orientation of the pattern and string are reversed (ie, str_detect(string, pattern) is equivalent to grepl(pattern, string)

0
cravetheflame On

Most importantly, contains() can only be used in a select statement. str_detect() can be used in any kind of statement, see @jpsmith's answer.

Furthermore, as mentioned in the documentation of tidyselect::contains():

[...]contains(): Contains a literal string.

[...] starts_with(), ends_with(), and contains() do not use regular expressions. [...]

[...] For starts_with(), ends_with(), and contains() this is an exact match.

Whereas, in the documentation of stringr::str_detect :

pattern : Pattern to look for. The default interpretation is a regular expression.