I want to speed up my program written in Go and convert regular expressions to finite state machines with ragel. I cannot figure out how to match end of input correctly when converting regular expressions similar to /cat[s]?(\b|$)/ (it matches a word border or end of input), so I made this workaround:
package main
import(
"strings"
"fmt"
"unicode"
)
func Match(data []byte) bool {
data = []byte(strings.ToLower(string(data)))
%%{
machine test;
write data;
}%%
cs, p, pe, eof := 0, 0, len(data), len(data)
_ = eof
var matchPos int
%%{
main := ('cat' 's'?) @{matchPos = p};
write init;
write exec;
}%%
return checkMatch(data, matchPos+1)
}
func checkMatch(data []byte, p int) bool {
if p == len(data) {
return true
}
tail := string(data[p:])
c := []rune(tail)[0]
if !unicode.IsLetter(c) && !unicode.IsDigit(c) {
return true
}
return false
}
func main () {
vs := []string{
"cat",
"cats",
"cat and dog",
"cats and dogs",
"caterpillar",
}
for _, v := range vs {
fmt.Printf("'%s': %v\n", v, Match([]byte(v)))
}
}
the output is correct:
'cat': true
'cats': true
'cat and dog': true
'cats and dogs': true
'caterpillar': false
I do think there is a better way though. What is a "proper" way to handle end of input with ragel?

The proper way to handle end of input is to use EOF actions, of course. And use actions in general, like this (reduced
Matchfunction):Which produces the following output (notice one important test case added):
And works like this:
What it adds is
setMatchedaction that is triggered either by the EOF in one of the final states (%/setMatched) of the first machine (cats?), or upon entering (>setMatched) the second one (which is almost\b, but can actually be replaced with the internalspacemachine). And it eliminates the need forcheckMatchcompletely.