I would like to parse the books of a library expressed in a format like this:
#Book title 1
Chapter 1
Chapter 2
#Book title 2
Chapter 1
Chapter 2
Chapter 3
As you can see, the titles of the boot are preceded by a # and the chapters of each book are the following lines. It should be rather easy to create a parser for this.
So far, I have this code (parsers + tokenizer):
void Main()
{
var tokenizer = new TokenizerBuilder<PrjToken>()
.Match(Superpower.Parsers.Character.EqualTo('#'), PrjToken.Hash)
.Match(Span.Regex("[^\r\n#:=-]*"), PrjToken.Text)
.Match(Span.WhiteSpace, PrjToken.WhiteSpace)
.Build();
var input = @"#Book 1
Chapter 1
Chapter 2
#Book 2
Chapter 1
Chapter 2
Chapter 3";
var library = MyParsers.Library.Parse(tokenizer.Tokenize(input));
}
public enum PrjToken
{
WhiteSpace,
Hash,
Text
}
public class Book
{
public string Title { get; }
public string[] Chapters { get; }
public Book(string title, string[] chapters)
{
Title = title;
Chapters = chapters;
}
}
public class Library
{
public Book[] Books { get; }
public Library(Book[] books)
{
Books = books;
}
}
public class MyParsers
{
public static readonly TokenListParser<PrjToken, string> Text = from text in Token.EqualTo(PrjToken.Text)
select text.ToStringValue();
public static readonly TokenListParser<PrjToken, Superpower.Model.Token<PrjToken>> Whitespace = from text in Token.EqualTo(PrjToken.WhiteSpace)
select text;
public static readonly TokenListParser<PrjToken, string> Title =
from hash in Token.EqualTo(PrjToken.Hash)
from text in Text
from wh in Whitespace
select text;
public static readonly TokenListParser<PrjToken, Book> Book =
from title in Title
from chapters in Text.ManyDelimitedBy(Whitespace)
select new Book(title, chapters);
public static readonly TokenListParser<PrjToken, Library> Library =
from books in Book.ManyDelimitedBy(Whitespace)
select new Library(books);
}
The above code is ready to run in .NET Fiddle on this link https://dotnetfiddle.net/3P5dAJ
Everything looks fine. However, something is wrong with the parser because I'm getting this error:
Syntax error (line 4, column 1): unexpected hash
#, expected text.
What's wrong with my parsers?
You can solve this by parsing the chapters as a separate list, where each chapter ends with the whitespace character:
In essence I think that when
Text.ManyDelimitedBy(Whitespace)encounters the trailing whitespace (newline) at the end ofChapter 2it will expect another instance of Chapter Name, not the start of a new book.The parser cannot distinguish between the delimiter between
Chaptersand the delimiter betweenBooks(both whitespace (newline)), and it will therefore expect another chapter, not the start of a newBook.By breaking up the parser of a Chapter into
Textfollowed by aWhitespacetoken you have broken this ambiguity.Since you now have swallowed the
Whitespaceat the end of the chapter, each book is not delimited by aWhitespace, and you have to change how theBookparser works as well:In addition to this, if you want the file to be parsed without a newline at the end of the file, you also have to make the
Whitespaceat the end of theChapterbe optional:In the end we end up with (Complete parser):