Problem statement
Suppose I have two parsers p
and q
and I concatenate them like this:
r = try p *> q
In Parsec, the behavior of this is:
- If
p
fails without consuming input, thenr
fails without consuming input. - If
p
fails after consuming input, thenr
fails without consuming input. - if
q
fails without consuming input, thenr
fails after consumingp
. - if
q
fails after consuming input, thenr
fails after consumingp
and parts ofq
.
However, the behavior I'm looking for is a bit unusual:
- If
p
fails without consuming input, thenr
should fail without consuming input. - If
p
fails after consuming input, thenr
should fail without consuming input. - if
q
fails without consuming input, thenr
should fail without consuming input. - if
q
fails after consuming input, thenr
should fail after consuming some input.
I can't seem to think of a clean way to do this.
Rationale
The reason is that I have a parser like this:
s = (:) <$> q <*> many r
The q
parser, embedded inside the r
parser, needs a way to signal either: invalid input (which occurs when q
consumes input but fails), or end of the many
loop (which occurs when q
doesn't consume anything and fails). If the input is invalid, it should just fail the parser entirely and report the problem to the user. If there is no more input to consume, then it should end the many
loop (without reporting a parser error to the user). The trouble is that it's possible that the input ends with a p
but without any more valid q
's to consume, in which case q
will fail but without consuming any input.
So I was wondering if anyone have an elegant way to solve this problem? Thanks.
Addendum: Example
p = string "P"
q = (++) <$> try (string "xy") <*> string "z"
Test input on (hypothetical) parser s
, had it worked the way I wanted:
xyz
(accept)xyzP
(accept;P
remains unparsed)xyzPx
(accept;Px
remains unparsed;q
failed but did not consume any input)xyzPxy
(reject; parserq
consumedxy
but failed)xyzPxyz
(accept)
In the form r = try p *> q
, s
will actually reject both case #2 and case #3 above. Of course, it's possible to achieve the above behavior by writing:
r = (++) <$> try (string "P" *> string "xy") <*> string "z"
but this isn't a general solution that works for any parsers p
and q
. (Perhaps a general solution doesn't exist?)
I believe I found a solution. It's not especially nice, but seems to works. At least something to start with:
This is the
r
combinator you're asking for. The idea is that we first execute the parsers in a "test" run (usinglookAhead . try
) and if any of them fails without consuming input, we record it asNothing
insideMaybeT
. This is accomplished byopt
, it converts a failure toNothing
and wraps it intoMaybeT
. Thanks toMaybeT
, ifopt p
returnsNothing
,opt q
is skipped.If both
p
andq
succeed, thetest ..
part returnsJust ()
. And if one of them consumes input, the wholetest ..
fails. This way, we distinguish the 3 possibilities:p
orq
.After
<|> pure (Just ())
both 1. and 3. result inJust ()
, while 2. results in Nothing. Finally, themaybe
part convertsNothing
into a non-consuming failure, andJust ()
into running the parsers again, now without any protection. This means that 1. fails again with consuming some input, and 3. succeeds.Testing: