the parentheses in str_match change the match

251 Views Asked by At

I am trying to extract the content between two slashed in a url, and for this I am using stringr function str_match.

library(stringr)
test <- "http://www.lefigaro.fr/flash-actu/2014/04/08/97001-20140408FILWWW00162-ump-cope-defend-sa-gestion-financiere.php"

I manage to extract the full string:

str_match(test, "http://.*?/.*?/")

     [,1]                                
[1,] "http://www.lefigaro.fr/flash-actu/"

But when I add the parentheses to extract the match within the string the result changes unexpectedly:

str_match(test, "http://.*?/(.*?)/")

     [,1]                                      [,2]  
[1,] "http://www.lefigaro.fr/flash-actu/2014/" "2014"

Must be a matter of how the parentheses are interpreted in regex. Any clue?

1

There are 1 best solutions below

2
On BEST ANSWER

Maybe if you change (.*?) by ([^/]*?) it will work.

  • . matches any character
  • [^/] matches all characters that are not a /

I'm not used to stringr, but that is what i'd do in php with preg_ functions.

Hope it helps