I am very new in ANTLR. I am developing a parser for the GURU language. I wrote a grammar and decided to check it on the website: site And I don't understand why I get errors.
Here is my grammar:
grammar Guru;
/*
PARSER RULES
*/
expertSystem : definition initialization completion rules variables EOF;
// Defenition
definition : GOAL ':' expertiseVariable;
// Initialization
initialization : INITIAL ':' (output | assignment | input)+;
// Completion
completion : DO ':' (assignment | output)+;
// Rules
rules : (rule)+;
rule : RULE ':' ruleName
(auxiliaryElement)* (ready)*
IF ':' premise
THEN ':' conclusion
(reason)* (usedVariables)*;
ruleName : IDENTIFIER;
auxiliaryElement : priority | cost | test | comment;
priority : PRIORITY ':' RANGE;
cost : COST ':' RANGE;
test : TEST ':' testValue;
testValue : 'S' | 'E' | 'P';
comment : COMMENT ':' text;
ready : READY ':' (readyCommand)+;
readyCommand : output | assignment;
// TODO: LOGOPERATOR
premise : andExpression;
andExpression : orExpression ('AND' orExpression)*;
orExpression : atomicExpression ('OR' atomicExpression)*;
atomicExpression : '(' premise ')' | comparisonExpression;
comparisonExpression : comparisonOperand COMOPERATOR comparisonOperand;
comparisonOperand : expertiseVariable | value | (function '(' expertiseVariable ')');
conclusion : (assignment)+;
reason : REASON ':' text;
usedVariables : needs | changes;
needs : NEEDS ':' '{' expertiseVariable (',' expertiseVariable)* '}';
changes : CHANGES ':' '{' expertiseVariable (',' expertiseVariable)* '}';
// Variables
variables : (variable)+;
variable : VAR ':' expertiseVariable (variableCommand)*;
variableCommand : find | label | when | cfType | rigor | limit;
find : FIND ':' (findCommand)+;
findCommand : assignment | input;
label : LABEL ':' text;
when : WHEN ':' whenValue;
whenValue : 'F' | 'L' | 'N';
cfType : CFTYPE ':' cfTypeValue cfTypeValue;
cfTypeValue : 'M' | 'P';
rigor : RIGOR ':' rigorValue;
rigorValue : 'M' | 'C' | 'A';
limit : LIMIT ':' NUMBER;
// General rules
output : OUTPUT ':' text;
assignment : expertiseVariable '=' value;
input : INPUT ':' expertiseVariable TYPE ':' TYPES WITH ':' text;
expertiseVariable : IDENTIFIER;
function : IDENTIFIER;
value : STRING | NUMBER;
text: STRING;
/*
LEXER RULES
*/
GOAL : 'GOAL';
INITIAL : 'INITIAL';
DO : 'DO';
RULE : 'RULE';
IF : 'IF';
THEN : 'THEN';
PRIORITY : 'PRIORITY';
COST: 'COST';
TEST: 'TEST';
COMMENT : 'COMMENT';
READY : 'READY';
REASON : 'REASON';
NEEDS : 'NEEDS';
CHANGES : 'CHANGES';
VAR : 'VAR';
FIND : 'FIND';
LABEL : 'LABEL';
WHEN : 'WHEN';
CFTYPE : 'CFTYPE';
RIGOR : 'RIGOR';
LIMIT : 'LIMIT';
OUTPUT : 'OUTPUT';
INPUT : 'INPUT';
TYPE : 'TYPE';
WITH : 'WITH';
IDENTIFIER : [a-zA-Z_][a-zA-Z0-9_]*;
STRING : '"' ~["]* '"';
RANGE : [1-9] [0-9]? | '100';
NUMBER : '0' | [1-9][0-9]*;
TYPES : 'NUM' | 'STRING' | 'REAL';
LOGOPERATOR : 'AND' | 'OR';
COMOPERATOR : '>' | '<' | '>=' | '<=' | '==';
WS : [ \t\r\n]+ -> skip;
Here is the text I wanted to check:
GOAL: RESH
INITIAL:
OUTPUT: "Some text"
COMPLETION:
DO: OUTPUT: "Some text"
RULE: R1
IF: RESH < 20
THEN: RESH = 20
VAR: RESH
Here are the errors it produces:
1:4 token recognition error at: ':'
2:7 token recognition error at: ':'
3:10 token recognition error at: ':'
3:12 token recognition error at: '"'
3:22 token recognition error at: '"'
4:10 token recognition error at: ':'
5:6 token recognition error at: ':'
5:14 token recognition error at: ':'
5:16 token recognition error at: '"'
5:26 token recognition error at: '"'
6:4 token recognition error at: ':'
7:2 token recognition error at: ':'
7:9 token recognition error at: '<'
8:4 token recognition error at: ':'
9:3 token recognition error at: ':'
1:0 mismatched input 'GOAL' expecting 'GOAL'
What is the problem? I have already rewritten the grammar several times, but the result is unsuccessful
Most of the error originate from the fact you did not clear the rules in the "lexer tab" in ANTLR lab. When you do that, many of the errors will disappear.
The problems that remain are then these:
Given these 2 lexer rules, the input
20will always become aRANGEtoken. That is simply how ANTLR produces tokens: it tries to match as many characters for every lexer rule, and when 2 (or more) lexer rule match the same characters, let the one defined first "win".The solution: remove
RANGEand replace allRANGEs in the parser rules withNUMBERs. Then after parsing, you can perform some semantic checks to see ifNUMBERis valid in certain places or not. You can do this in an ANTLR listener.The second problem is that the parser does not recognize the input:
and given the grammar, I do not see what parser rule you are trying to match for this input. The parser rule
completion:seems to be missing the keyword
COMPLETIONand a:at the start. This could be a solution: