Rust Nom How to do recusrive inline parsing

49 Views Asked by At

I'm using nom to create a parser for a specific markdown flavor.

This flavor also includes the basic bold, italic and strike through option for text inside the paragraph. I managed to get this working for non-nested text like

this is a **bold** and __italic__ text

but not for

this is a **__bold and italic__** text

as I would need to call the inline parser recursively.

This is my code. Basically, I use the parse_inline_text function in the main parser at the end to detect paragraphs.

#[derive(Clone, Debug, PartialEq)]
pub enum InlineElement {
    Bold(String),
    Italic(String),
    StrikeThrough(String),
    Text(String),
}

fn enclosed<'a>(start: &'a str, end: &'a str) -> impl FnMut(&'a str) -> IResult<&'a str, &str> {
    map(tuple((tag(start), take_until(end), tag(end))), |x| (x.1))
}

fn parse_text_bold(i: &str) -> IResult<&str, &str> {
    enclosed("**", "**")(i)
}

fn parse_text_italics(i: &str) -> IResult<&str, &str> {
    enclosed("__", "__")(i)
}

fn parse_text_strike_through(i: &str) -> IResult<&str, &str> {
    enclosed("~", "~")(i)
}

fn parse_text_plain(i: &str) -> IResult<&str, String> {
    map(
        many1(preceded(
            not(alt((
                tag("*"),
                tag("_"),
                tag("~"),
                tag("\n"),
            ))),
            take(1u8),
        )),
        |vec| vec.join(""),
    )(i)
}

fn parse_inline(i: &str) -> IResult<&str, InlineElement> {
    alt((
        map(parse_text_bold, |s: &str| {
            InlineElement::Bold(s.to_string())
        }),
        map(parse_text_italics, |s: &str| {
            InlineElement::Italic(s.to_string())
        }),
        map(parse_text_strike_through, |s: &str| {
            InlineElement::StrikeThrough(s.to_string())
        }),
        map(parse_text_plain, |s| InlineElement::Text(s.to_string())),
    ))(i)
}

pub fn line_seperator(chr: char) -> bool {
    return chr == '\n';
}

fn parse_inline_text<'a>(
    i: &str,
) -> Result<(&str, Vec<InlineElement>), nom::Err<nom::error::Error<&str>>> {
    terminated(
        many0(parse_inline),
        tuple((tag("\n"), take_while(line_seperator))),
    )(i)
}

At the time being, it returns a Vec<InlineElement> whereby every InlineElement holds a String. When getting it to work recursivly, the enumerations should each contain a Vec<InlineElement> except Text. How to solve this with nom?

This is what I tried so far. Unfortunately, it panicks.

#[derive(Clone, Debug, PartialEq)]
pub enum InlineElement {
    Bold(Vec<InlineElement>),
    Italic(Vec<InlineElement>),
    Complex(Vec<InlineElement>),
    Formula(Vec<InlineElement>),
    StrikeThrough(Vec<InlineElement>),
    Text(String),
}

fn enclosed<'a>(start: &'a str, end: &'a str) -> impl FnMut(&'a str) -> IResult<&'a str, Vec<InlineElement>> {
    map(tuple((tag(start), take_until(end), tag(end))), |x| parse_inline_text(x.1).unwrap().1)
}


fn parse_text_bold(i: &str) -> IResult<&str, Vec<InlineElement>> {
    enclosed("**", "**")(i)
}

fn parse_text_italics(i: &str) -> IResult<&str, Vec<InlineElement>> {
    enclosed("__", "__")(i)
}

fn parse_text_strike_through(i: &str) -> IResult<&str, Vec<InlineElement>> {
    enclosed("~", "~")(i)
}


fn parse_inline(i: &str) -> IResult<&str, InlineElement> {
    alt((
        map(parse_text_bold, |e: Vec<InlineElement>| {
            InlineElement::Bold(e)
        }),
        map(parse_text_italics, |e: Vec<InlineElement>| {
            InlineElement::Italic(e)
        }),
        map(parse_text_strike_through, |e: Vec<InlineElement>| {
            InlineElement::StrikeThrough(e)
        }),
        map(parse_text_plain, |s| InlineElement::Text(s.to_string())),
    ))(i)
}

My guess is that the parse_inline_text(x.1).unwrap().1 is wrong but I just don't know how to fix the enclosed function to return the result of parse_inline_text.

2

There are 2 best solutions below

1
true equals false On

The problem is that parse_inline_text expects the text inside for example asterisks to end with a newline. By changing this:

fn parse_inline_text<'a>(
    i: &str,
) -> Result<(&str, Vec<InlineElement>), nom::Err<nom::error::Error<&str>>> {
    terminated(
        many0(parse_inline),
        tuple((tag("\n"), take_while(line_seperator))),
    )(i)
}

to this:

fn parse_inline_text<'a>(
    i: &str,
) -> Result<(&str, Vec<InlineElement>), nom::Err<nom::error::Error<&str>>> {
    terminated(
        many0(parse_inline),
        take_while(line_seperator),
    )(i)
}

it works. Here is a full working example (playground):

use nom::sequence::terminated;
use nom::{
    branch::alt,
    bytes::complete::tag,
    bytes::complete::take,
    bytes::complete::take_until,
    combinator::map,
    combinator::not,
    multi::{many0, many1},
    sequence::preceded,
    sequence::tuple,
    IResult,
};
use nom::bytes::complete::take_while;

#[derive(Clone, Debug, PartialEq)]
pub enum InlineElement {
    Bold(Vec<InlineElement>),
    Italic(Vec<InlineElement>),
    Complex(Vec<InlineElement>),
    Formula(Vec<InlineElement>),
    StrikeThrough(Vec<InlineElement>),
    Text(String),
}

fn enclosed<'a>(
    start: &'a str,
    end: &'a str,
) -> impl FnMut(&'a str) -> IResult<&'a str, Vec<InlineElement>> {
    map(tuple((tag(start), take_until(end), tag(end))), |x| {
        parse_inline_text(x.1).unwrap().1
    })
}

fn parse_text_bold(i: &str) -> IResult<&str, Vec<InlineElement>> {
    enclosed("**", "**")(i)
}

fn parse_text_italics(i: &str) -> IResult<&str, Vec<InlineElement>> {
    enclosed("__", "__")(i)
}

fn parse_text_strike_through(i: &str) -> IResult<&str, Vec<InlineElement>> {
    enclosed("~", "~")(i)
}

fn parse_text_plain(i: &str) -> IResult<&str, String> {
    map(
        many1(preceded(
            not(alt((tag("*"), tag("_"), tag("~"), tag("\n")))),
            take(1u8),
        )),
        |vec| vec.join(""),
    )(i)
}

pub fn line_seperator(chr: char) -> bool {
    return chr == '\n';
}

fn parse_inline_text<'a>(
    i: &str,
) -> Result<(&str, Vec<InlineElement>), nom::Err<nom::error::Error<&str>>> {
    terminated(
        many0(parse_inline),
        take_while(line_seperator),
    )(i)
}

fn parse_inline(i: &str) -> IResult<&str, InlineElement> {
    alt((
        map(parse_text_bold, |e: Vec<InlineElement>| {
            InlineElement::Bold(e)
        }),
        map(parse_text_italics, |e: Vec<InlineElement>| {
            InlineElement::Italic(e)
        }),
        map(parse_text_strike_through, |e: Vec<InlineElement>| {
            InlineElement::StrikeThrough(e)
        }),
        map(parse_text_plain, |s| InlineElement::Text(s.to_string())),
    ))(i)
}

fn main() {
    let input = "this is a **__bold and italic__** text";
    let out = parse_inline_text(input).unwrap();
    dbg!(out);
}

If it is important that parse_inline_text expects a newline in the end, it is incorrect to use this function inside enclosed, unless this should be seen as invalid: **text that does not end in newline**. You should in that case create a new function similar to parse_inline_text that doesn't expects a newline and use that in enclosed.

0
hypnomaki On

While @true-equals-false answer works, another clean way to get this working is to change the enclosed function.

fn enclosed<'a>(
    start: &'a str,
    end: &'a str,
) -> impl FnMut(&'a str) -> IResult<&'a str, InlineElements> {
    map(
        tuple((
            tag(start),
            many0(parse_inline), // change 'take_until(end)' to this
            tag(end)
        )),
        |(_, elements, _)| elements,
    )
}