Custom Nom parser error without custom ErrorKind

836 Views Asked by At

I have a small parser that fails if the number it parses is out of bounds,

use nom::IResult;
use nom::bytes::complete::tag;
use nom::character::complete::digit1;
use nom::combinator::fail;

fn dup(s: &str) -> IResult<&str, u64> {
    let (s, _) = tag("dup")(s)?;
    let (s, n) = digit1(s)?;

    let n = match n {
        "0" => 0,
        "1" => 1,
        "2" => 2,
        _ => return fail(s), // FIXME: Annotate.
    };

    Ok((s, n))
}

(See on playground.)

But I'd like for the error to be more meaningful.

How do I annotate this error with the context that n is out of bounds?

E.g. instead of fail(), use something that provides an error message.

Or somehow wrap a part of the parser in something that provides this context.

I know that you can create your own custom ErrorKind, but can it be done without? (When you have an ErrorKind, error_position!() and error_node_position!() macros will work.)

2

There are 2 best solutions below

0
On BEST ANSWER

You probably want to read Error Management.

Broadly speaking, nom::error::Error is a low-overhead type for parser errors, that's why it only has the parser's errors.

If you want to attach more context, nom provides a nom::error::VerboseError type, there's also ancillary crates which provide further error wrappers.

Finally, if that is still not sufficient nom's error handling is based around the ParseError trait so you can have a completely custom error type and implement that. The latter option obviously has the highest overhead, but also the highest level of flexibility.

0
On

Thanks to Masklinn's answer (which I've marked as accepted), a way to add custom error messages without adding a custom error type is to use VerboseError and convert_error() instead of the default Error, since this is capable for embedding context. Here's a modified example (also on playground):

use nom::IResult;
use nom::bytes::complete::tag;
use nom::character::complete::digit1;
use nom::combinator::fail;
use nom::error::{context, VerboseError};
use nom::error::convert_error;
use nom::Finish;

fn dup(s: &str) -> IResult<&str, u64, VerboseError<&str>> {
    let (s, _) = tag("dup")(s)?;
    let (sd, n) = digit1(s)?;

    let n = match n {
        "0" => 0,
        "1" => 1,
        "2" => 2,
        _ => return fail(s), // FIXME: Annotate.
    };

    Ok((sd, n))
}

fn main() {
    let input = "dup3";
    let result = context("dup", dup)(input).finish().err().unwrap();
    println!("{}", convert_error(input, result));
}

Adding context("dup", dup) provides a quite beautiful and readable context to the error message:

0: at line 1, in Fail:
dup3
   ^

1: at line 1, in dup:
dup3
^

but it does not add clarity at the innermost layer. If I add a context on the fail line:

let n = match n {
    "0" => 0,
    "1" => 1,
    "2" => 2,
    _ => return context("using an out-of-bounds dup", fail)(s),
};

then the message becomes

0: at line 1, in Fail:
dup3
   ^

1: at line 1, in using an out-of-bounds dup:
dup3
   ^

2: at line 1, in dup:
dup3
^

which is almost what I want! But I really just want to replace the message "in Fail" with "in using an out-of-bounds dup", not add to it. It is worth mentioning here what the Error Management docs say about convert_error():

Note that VerboseError and convert_error are meant as a starting point for language errors, but that they cannot cover all use cases. So a custom convert_error function should probably be written.

So the least complicated way I've found to add custom annotation/context to error messages, is using VerboseError, but also replacing convert_error with one that pops the ErrorKind::Fail and ErrorKind::Eof if they are followed by something with a context, in which case I expect the context to reside at the same position, causing a duplicate entry:

fn pretty_print_error(s: &str, mut e: VerboseError<&str>) -> String {
    let (_root_s, root_error) = e.errors[0].clone();
    if matches!(root_error, VerboseErrorKind::Nom(ErrorKind::Fail))
        || matches!(root_error, VerboseErrorKind::Nom(ErrorKind::Eof))
    {
        e.errors.remove(0);
    }
    convert_error(s, e)
}

Simpler solutions are welcome.