Input buffer overflow in scanner for long comments

43 Views Asked by At

I have defined a LEX scanner with the following rule for scanning (nested) comments:

"(*" {
    int linenoStart, level, ch;

    linenoStart = yylineno;
    level = 1;
    do {
        ch = input();
        switch (ch) {
            case '(':
                ch = input();
                if (ch == '*') {
                    level++;
                } else {
                    unput(ch);
                }
                break;
            case '*':
                ch = input();
                if (ch == ')') {
                    level--;
                } else {
                    unput(ch);
                }
                break;
            case '\n':
                yylineno++;
                break;
        }
    } while ((level > 0) && (ch > 0));
    assert((ch >= 0) || (ch == EOF));
    
    if (level > 0) {
        fprintf(stderr, "error: unterminated comment starting at line %d", linenoStart);
        exit(EXIT_FAILURE);
    }
}

When compiled with FLEX 2.6.4 I get the following error message when I run the scanner on an input file containing a comment with more than 16382 characters:

input buffer overflow, can't enlarge buffer because scanner uses REJECT

Why is that and how can the problem be resolved?

1

There are 1 best solutions below

4
August Karlstrom On BEST ANSWER

When a pattern is matched, in this case (*, only YY_BUF_SIZE = 16384 characters can be retrieved with the function input which reads from a buffer of this size. This limits the size of a comment to YY_BUF_SIZE characters. To enable comments of any length we can instead use patterns with a context, like this:

"(*" {
    BEGIN(comment);
    commentNestingLevel = 1;
    commentStartLine = yylineno;
}

<comment>[^*(\n]+

<comment>\n {
    yylineno++;
}

<comment>"*"+/[^)]

<comment>"("+/[^*]

<comment>"(*" commentNestingLevel++;

<comment>"*)" {
    commentNestingLevel--;
    if (commentNestingLevel == 0) {
        BEGIN(INITIAL);
    }
}

<comment><<EOF>> {
    fprintf(stderr, "error: unterminated comment starting at line %d", commentStartLine);
    exit(EXIT_FAILURE);
}