I am writing a parser by using PLY. The question is similar to this one How to write a regular expression to match a string literal where the escape is a doubling of the quote character?. However, I use double-quote to open and close a string. For example:
"I do not know what \"A\" is"
I define the normal string lexer as:
t_NORMSTRING = r'"([^"\n]|(\\"))*"$'
and I have another lexer for a variable:
def t_VAR(t):
r'[a-zA-Z_][a-zA-Z_0-9]*'
The problem is my lexer doesn't recognize "I do not know what \"A\" is" as a NORMSTRING token. It returns the error
Illegal character '"' at 1
Syntax error at 'LexToken(VAR,'do',10,210)'
Please let me know why it is not correct.
Having explored this issue with a little PLY program, I think your issue is related to the differences between handling raw and non-raw strings in the data handling, and not with the PLY parsing and lexical matching itself. (Just as a side note, there are minor differences between python V2 and python v3 in this area of string handling. I have restricted my code to python v2).
You only get the error you are seeing if you use a non-raw string or use
input
instead ofraw_input
. This is shown from my example code and results below:As a final note, You probably do not need the string anchor
$
in your regular expression.