Parsing list of lists with Superpower

Question

Parsing list of lists with Superpower

220 Views Asked by SuperJMN At 17 October 2019 at 21:40

I would like to parse the books of a library expressed in a format like this:

#Book title 1
Chapter 1
Chapter 2
#Book title 2
Chapter 1
Chapter 2
Chapter 3

As you can see, the titles of the boot are preceded by a # and the chapters of each book are the following lines. It should be rather easy to create a parser for this.

So far, I have this code (parsers + tokenizer):

void Main()
{
    var tokenizer = new TokenizerBuilder<PrjToken>()
                    .Match(Superpower.Parsers.Character.EqualTo('#'), PrjToken.Hash)
                    .Match(Span.Regex("[^\r\n#:=-]*"), PrjToken.Text)
                    .Match(Span.WhiteSpace, PrjToken.WhiteSpace)
                    .Build();


    var input = @"#Book 1
Chapter 1
Chapter 2
#Book 2
Chapter 1
Chapter 2
Chapter 3";

    var library = MyParsers.Library.Parse(tokenizer.Tokenize(input));
}


public enum PrjToken
{
    WhiteSpace,
    Hash,
    Text
}


public class Book
{
    public string Title { get; }
    public string[] Chapters { get; }

    public Book(string title, string[] chapters)
    {
        Title = title;
        Chapters = chapters;
    }
}

public class Library
{
    public Book[] Books { get; }

    public Library(Book[] books)
    {
        Books = books;
    }
}


public class MyParsers
{
    public static readonly TokenListParser<PrjToken, string> Text = from text in Token.EqualTo(PrjToken.Text)
                                                                    select text.ToStringValue();

    public static readonly TokenListParser<PrjToken, Superpower.Model.Token<PrjToken>> Whitespace = from text in Token.EqualTo(PrjToken.WhiteSpace)
                                                                                   select text;

    public static readonly TokenListParser<PrjToken, string> Title =
        from hash in Token.EqualTo(PrjToken.Hash)
        from text in Text
        from wh in Whitespace
        select text;

    public static readonly TokenListParser<PrjToken, Book> Book =
        from title in Title
        from chapters in Text.ManyDelimitedBy(Whitespace)
        select new Book(title, chapters);

    public static readonly TokenListParser<PrjToken, Library> Library =
        from books in Book.ManyDelimitedBy(Whitespace)
        select new Library(books);
}

The above code is ready to run in .NET Fiddle on this link https://dotnetfiddle.net/3P5dAJ

Everything looks fine. However, something is wrong with the parser because I'm getting this error:

Syntax error (line 4, column 1): unexpected hash #, expected text.

What's wrong with my parsers?

Original Q&A

There are 1 best solutions below

**Fredrik Blom** · Accepted Answer · 2019-10-18T11:37:13.437000

You can solve this by parsing the chapters as a separate list, where each chapter ends with the whitespace character:

    public static readonly TokenListParser<PrjToken, string> Chapter =
        from chapterName in Text
        from wh in Whitespace
        select chapterName;

    public static readonly TokenListParser<PrjToken, Book> Book =
        from title in Title
        from chapters in Chapter.Many()
        select new Book(title, chapters);

In essence I think that when Text.ManyDelimitedBy(Whitespace) encounters the trailing whitespace (newline) at the end of Chapter 2 it will expect another instance of Chapter Name, not the start of a new book.

The parser cannot distinguish between the delimiter between Chapters and the delimiter between Books (both whitespace (newline)), and it will therefore expect another chapter, not the start of a new Book.

By breaking up the parser of a Chapter into Text followed by a Whitespace token you have broken this ambiguity.

Since you now have swallowed the Whitespace at the end of the chapter, each book is not delimited by a Whitespace, and you have to change how the Book parser works as well:

    public static readonly TokenListParser<PrjToken, Book> Book =
        from title in Title
        from chapters in Chapter.Many()
        select new Book(title, chapters);

In addition to this, if you want the file to be parsed without a newline at the end of the file, you also have to make the Whitespace at the end of the Chapter be optional:

    public static readonly TokenListParser<PrjToken, string> Chapter =
        from chapterName in Text
        from wh in Whitespace.Optional()
        select chapterName;

In the end we end up with (Complete parser):

public class MyParsers
{
    public static readonly TokenListParser<PrjToken, string> Text = from text in Token.EqualTo(PrjToken.Text)
        select text.ToStringValue();

    public static readonly TokenListParser<PrjToken, Superpower.Model.Token<PrjToken>> Whitespace = from text in Token.EqualTo(PrjToken.WhiteSpace)
        select text;

    public static readonly TokenListParser<PrjToken, string> Title =
        from hash in Token.EqualTo(PrjToken.Hash)
        from text in Text
        from wh in Whitespace
        select text;

    public static readonly TokenListParser<PrjToken, string> Chapter =
        from chapterName in Text
        from wh in Whitespace.Optional()
        select chapterName;

    public static readonly TokenListParser<PrjToken, Book> Book =
        from title in Title
        from chapters in Chapter.Many()
        select new Book(title, chapters);

    public static readonly TokenListParser<PrjToken, Library> Library =
        from books in Book.Many()
        select new Library(books);
}

Parsing list of lists with Superpower

There are 1 best solutions below

Related Questions in .NET

Related Questions in PARSING

Related Questions in SUPERPOWER

Trending Questions

Popular # Hahtags

Popular Questions