I am trying to extract the content between two slashed in a url, and for this I am using stringr
function str_match
.
library(stringr)
test <- "http://www.lefigaro.fr/flash-actu/2014/04/08/97001-20140408FILWWW00162-ump-cope-defend-sa-gestion-financiere.php"
I manage to extract the full string:
str_match(test, "http://.*?/.*?/")
[,1]
[1,] "http://www.lefigaro.fr/flash-actu/"
But when I add the parentheses to extract the match within the string the result changes unexpectedly:
str_match(test, "http://.*?/(.*?)/")
[,1] [,2]
[1,] "http://www.lefigaro.fr/flash-actu/2014/" "2014"
Must be a matter of how the parentheses are interpreted in regex. Any clue?
Maybe if you change
(.*?)
by([^/]*?)
it will work..
matches any character[^/]
matches all characters that are not a/
I'm not used to stringr, but that is what i'd do in php with preg_ functions.
Hope it helps