How to know if a match is adjacent to the previous match

133 Views Asked by At

In a construction like

string.scan(regex){...}

or

string.gsub(regex){...}

how can check if the match for a loop cycle is adjacent to the previous one in the original string? For example, in

"abaabcaaab".scan(/a+b/){|match|
    ...
    continued = ...
    ...
}

there will be three matches "ab", "aab", and "aaab". During each cycle, I want them to have the variable continued to be false, true, and false respectively because "ab" is the first match cycle, "aab" is adjacent to it, and "c" interrupts before the next match "aaab".

"ab" #=> continued = false
"aab" #=> continued = true
"aaab" #=> continued = false

Is there an anchor in origuruma that refers to the end of the previous matching position? If so, that may be used in the regex. If not, I probably need to use things like MatchData#offset. and do some calculation in the loop.

By the way, what is \G in origuruma regex? I had the impression that it might be the anchor that I want, but I am not sure what it is.

2

There are 2 best solutions below

2
On

I don't believe the offset data is available using those methods. You'll probably have to use Regexp#match, passing along the location each time. The returned MatchData object contains all the info you need to do any substitutions etc too.

Of course, you'll have to be careful if you are incrementing offsets in combination with doing string substitutions, if the length of the replacement is not the same as the length of the match. A common pattern here is to walk the string backwards, but I don't think you'll be able to follow that pattern with these methods, so you'll need to adjust the offsets.

EDIT | Actually, you would be able to walk the string backwards, if you do the replacement in a completely separate step. First find everything you need to replace, along with the offsets. Next, iterate that list in reverse order, doing your substitutions.

0
On

StringScanner would be well suited to this task: http://corelib.rubyonrails.org/classes/StringScanner.html

require 'strscan'
s = StringScanner.new('abaabcaaab')

begin
        puts s.pos
        s.scan_until(/a+b/)
        puts s.matched
end while !s.matched.nil?

outputs

0
ab
2
aab
5
aaab
10
nil

So you could then just keep track of the length of the last match and the position and do the math to see if they are adjacent.