I am trying to create a JavaCC parser and I am facing an issue.
I want to return everything between parentheses in my text but the string between those parentheses may contain some.
For example, I have this line :
Node(new MB34(MB78, MB654) => (MB7, M9)) and I want a string equals to "new MB34(MB78, MB654) => (MB7, M9)". There is no specific pattern between parentheses.
I have tried to use lexical state according to the javacc documentation:
SKIP :
{ " " | "\t" | "\n" | "\r" | "\f" | "\r\n" }
TOKEN :
{
< #LETTER : ( [ "a"-"z" ] | [ "A"-"Z" ] ) >
| < #DIGIT : [ "0"-"9" ] >
| < #ALPHA : ( < LETTER > | < DIGIT > ) >
| < IDENTIFIER : < LETTER > ( < ALPHA > )* >
}
TOKEN : {
< "(" > : IN_LABEL
}
< IN_LABEL > TOKEN : {
< TEXT_LABEL : ~[] >
}
< IN_LABEL > TOKEN : {
< END_LABEL : ")"> : DEFAULT
}
String LABEL():
{
Token token_label;
String label = "";
}
{
< IDENTIFIER >
"(" ( token_label = < TEXT_LABEL > { label += token_label.toString(); } )+ < END_LABEL >
{
return label;
}
}
However, since the string to get out of the lexical state "IN_LABEL" is the single character ")" it doesn't work, the parser matches all the text without returning to the DEFAULT state. I found a temporary solution by replacing the END_LABEL token by :
< IN_LABEL > TOKEN : {
< END_LABEL : ~[]")"> : DEFAULT
}
But it doesn't work either because this token can match before the real end of the label.
Does anyone have a solution to this problem?
There may be a simpler solution, but here's mine: