I need a parser that took only important marked parts from a text file. This is sample input:
else before 1
else before 2
--Start Query 1
important 1
--End 1
else between 1 and 2 - 1
else between 1 and 2 - 2
--Start Query 2
important 2
--End 2
else after1-1
else after1-2
I wrote this parser:
public class ExpressionDefinition extends GrammarDefinition {
{
def("start", ref("expr").star().end());
def("nl", of("\r\n").or(of("\n").or(of("\r"))));
def("expr",
ref("else").starLazy(ref("expr_start").flatten())
.seq(ref("expr_start"))
.seq(ref("expr_body"))
.seq(ref("expr_end"))
.seq(ref("else").starLazy(ref("expr_start")).optional()).map(in -> {
if (in instanceof List) {
for (Object o: (List)in) {
if (o instanceof Body) {
return o;
}
}
}
return null;
}));
def("expr_start", of("--Start Query").seq(any().starLazy(ref("nl")), ref("nl")));
def("expr_body", any().starLazy(ref("expr_end")).flatten().map((String in) -> new Body(in)));
def("expr_end", of("--End").seq(any().starLazy(ref("nl")).optional(), ref("nl").optional()));
def("else", any().starLazy(ref("nl")).seq(ref("nl")));
}
With this small utility Pojo for getting important data:
@Data
@AllArgsConstructor
public static class Body {
private final String val;
@Override public String toString() { return val; }
}
Run like:
ExpressionDefinition def = new ExpressionDefinition();
Parser parser = def.build();
Result result = parser.parse(input);
And it throw me an exception:
org.petitparser.context.ParseError: end of input expected
But no visible reason why, as the last line is else kind of content and we expecting to have it with star condition, and its a part of expr: ref("else").starLazy(ref("expr_start")).optional()
How can I change the parser, so it will expect et the end of each expr to be any amount of else with and without possible new-line character at the end of the input? Making else just greed makes it consume the second expr_body. Making it any().optional() cause infinite loop hang.
Any solutions to this?
Probably you want to use the
a.delimitedBy(b)operator, which gives you a parser that consumesaone or more times separated and possibly ended by the argumentb. If you need more control, have a look at how it is implemented.