Include whitespaces when parsing with Irony

266 Views Asked by At

I am writing a parser using the following library: https://www.nuget.org/packages/Irony

My current goal is to parse a file that contains lines of plain text. Each line starts with either a whitespace or a tab symbol.

This is how my grammar class looks like:

NonTerminal program = new NonTerminal("program");
NonTerminal textStatement = new NonTerminal("textStatement");
NonTerminal textStatements = new NonTerminal("textStatements");

FreeTextLiteral text = new FreeTextLiteral("text", "\r\n");

KeyTerm whitespace = ToTerm(" ", "whitespace");
KeyTerm tab = ToTerm("  ", "tab");
KeyTerm newline = ToTerm("\n", "newline");

textStatement.Rule = ((whitespace | tab) + text + newline);
textStatements.Rule = MakePlusRule(textStatements, textStatement);

program.Rule = textStatements;
this.Root = program;

And this is the content of a target file (lines are not included):

----------------------
 test

----------------------

Surprisingly, the thing fails on me with the following message:

Column 1, Line 0:
Syntax error, expected: whitespace, tab

It looks like the grammar is configured to skip whitespaces and tabs by default. So, it starts parsing with a "t" letter, having skipped the first " " symbol. This is fine for most cases, but not for this one. I'm trying to write a python-like language, so tracking of whitespaces is important.

I'm not expecting you to write the whole grammar for me, just suggest a generic approach. Any help is appreciated, thanks!

UPD: I ended up overriding 2 functions like this:

    public override bool IsWhitespaceOrDelimiter(char ch)
    {
        if (ch == ' ' || ch == '\t')
            return false;
        return base.IsWhitespaceOrDelimiter(ch);
    }

    public override void SkipWhitespace(ISourceStream source)
    {
        while (!source.EOF())
        {
            switch (source.PreviewChar)
            {
                //case ' ':
                //case '\t':
                //    break;
                case '\r':
                case '\n':
                case '\v':
                    if (UsesNewLine) return;
                    break;
                default:
                    return;
            }
            source.PreviewPosition++;
        }
    }
1

There are 1 best solutions below

0
On BEST ANSWER

If you want to handle 'space' as an explicit char in grammar, you need to override IsWhitespaceOrDelimiter method, and for space return false. and same for tab and other chars