I'm interested in using lex to tokenize my input string, but I do not want it to be possible to "fail". Instead, I want to have some type of DEFAULT or TEXT token, which would contain all the non-matching characters between recognized tokens.
Anyone have experience with something like this?
Use the pattern
.at the end of all your lex rules to match any character that isn't matched by any other rule. You may also need a\nrule to match newlines (a newline is the only character the.doesn't match)If you want to combine adjacent non-matching characters into a single token, that is harder, and is more easily done in the parser.