Regex finding more than one time pattens in string with R

72 Views Asked by At

Say, if I have such a string: 03:30-12:20, 12:30-15:0015:30-18:00 and i need to break them into an array: 03:30 12:20 12:30 15:00 15:30 18:00

Can anyone suggest what regex and R function I should use to do so? Thanks!

3

There are 3 best solutions below

0
On

I contributed a regex solely for this in the qdapRegex package.

library(qdapRegex)
x <- '03:30-12:20, 12:30-15:0015:30-18:00'
rm_time(x, extract=T)[[1]]
# [1] "03:30" "12:20" "12:30" "15:00" "15:30" "18:00"
0
On

Try:

regmatches(string, gregexpr('\\d\\d:\\d\\d', string))
[[1]]
[1] "03:30" "12:20" "12:30" "15:00" "15:30" "18:00"

Notice that the colon is always surrounded by four digits. We repeat that pattern with the special regex character \\d which means digit. [0-9] is used in the other answer and is just as good, if not better for advanced regex tokenizing operations. I used \\d to show other avenues to the same goal.

You can also specify how many digits should be matched with curly braces. In this case, 2 digits is what we're looking for around colons,

regmatches(string, gregexpr('\\d{2}:\\d{2}', string))
[[1]]
[1] "03:30" "12:20" "12:30" "15:00" "15:30" "18:00"
0
On
regmatches(input,gregexpr('\\d{2}:\\d{2}',input))

OR

strsplit(gsub("(\\d{2})(?=\\d{2})","\\1 ,\\2",input,perl=T),',|-')