I want to make a lexer for c-subset.here is the code.I tested it with something like 3l12 or 012977,they are invalid id and invalid octal integer,so it should return WRONG. However, the lexer breaks them into 3 l12 012 977, and recognize them individually.
ID [a-z_A-Z][a-z_A-Z0-9]*
EXP ([Ee][\+\-]?[0-9]+)
FLOAT (([0-9]*\.[0-9]+|[0-9]+\.){EXP}?[fF]?)|[0-9]+{EXP
INT ([1-9][0-9]*|0[0-7]*|0[Xx][0-9a-fA-F]+)}
HEXFLOAT 0[Xx][0-9a-fA-F]*(.)[0-9a-fA-F]*[Pp][\+\-]?[0-9]+
MultilineComment "/*"([^\*]|(\*)*[^\*/])*(\*)*"*/"
SingleLineComment "//".*
%%
"int" {return INT;}
"float" {return FLOAT;}
"void" {return VOID;}
"const" {return CONST;}
"return" {return RETURN;}
"if" {return IF;}
"else" {return ELSE;}
"while" {return WHILE;}
"break" {return BREAK;}
"continue" {return CONTINUE;}
">" {return GT;}
"<" {return LT;}
">=" {return GE;}
"<=" {return LE;}
"==" {return EQ;}
"!=" {return NEQ;}
"(" {return LP;}
")" {return RP;}
"[" {return LB;}
"]" {return RB;}
"{" {return LC;}
"}" {return RC;}
"," {return COMMA;}
";" {return SEMICOLON;}
"!" {return NOT;}
"=" {return ASSIGN;}
"-" {return MINUS;}
"+" {return ADD;}
"*" {return MUL;}
"/" {return DIV;}
"%" {return MOD;}
"&&" {return AND;}
"||" {return OR;}
{ID} {return ID;}
{INT} {return INT_LIT;}
{FLOAT}|{HEXFLOAT} {return FLOAT_LIT;}
[ \t\n]+ {}
{SingleLineComment} {}
{MultilineComment} {}
. {return WRONG;}
I tried to fix it by changing the order of rules,and removing the rule eating the space. but it didn't work. How could I fix this problem.do I need to add rules specific for these can't of errors?