How to write a boost::spirit::qi parser to do what '?' does in regex?

557 Views Asked by wanghan02 At 14 October 2016 at 12:51

Let's say we have a regex "start:(?: ([0-9]{1,2}))? ([0-9].*)".

It will match

std::string string1 = "start: 01 0ab";

and

std::string string2 = "start: 0ab";

We can also get the 2 matched string respectively.

I try to use boost::spirit::qi parser to parse string2 but it couldn't match.

qi::rule<std::string::const_iterator, std::string()> rule1 = qi::repeat(1,2)[qi::digit];
qi::rule<std::string::const_iterator, std::string()> rule2 = qi::digit >> *qi::char_;
std::vector<std::string> attr;
auto it_begin = string2.begin();
auto it_end = string2.end();
if (qi::parse(
    it_begin,
    it_end,
    qi::lit("start:")
         >> -(qi::lit(" ") >> rule1)
         >> qi::lit(" ") >> rule2
         >> qi::eoi,
    attr))
    std::cout<<"match"<<std::endl;
else
    std::cout<<"not match"<<std::endl;

We can of course use a look-ahead operator to check what's behind rule1, but is there a more generic approach to implement regex operator '?' ? Thanks!

Original Q&A

There are 1 best solutions below

sehe On 15 October 2016 at 00:17 BEST ANSWER

I'm not sure what's wrong with the expectation. It is the only way for otherwise ambiguous rules, since PEG grammars are always greedy.

However, maybe you didn't arrive at the most elegant form, since you were looking for something "better". Here's what I'd do.

I'd use a skipper to match spaces¹:

    if (qi::phrase_parse(it_begin, it_end,
                "start:" >> -rule1 >> rule2 >> qi::eoi,
                qi::space, attr))

Where the rules are still lexemes (because there were declared without the skipper):

qi::rule<It, std::string()> const 
    rule1 = qi::digit >> qi::digit >> &qi::space,
    rule2 = qi::digit >> *qi::graph;

Note qi::graph doesn't match whitespace, where *qi::char_ simply matches anything at all greedily.

Live On Coliru

#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;

int main() {
    using It = std::string::const_iterator;

    // implicitly lexemes (no skipper in rule declaration)
    qi::rule<It, std::string()> const 
        rule1 = qi::digit >> qi::digit >> &qi::space,
        rule2 = qi::digit >> *qi::graph;

    for (std::string const input : { "start: 01 0ab", "start: 0ab", }) {
        std::vector<std::string> attr;

        auto it_begin = input.begin();
        auto it_end   = input.end();

        if (qi::phrase_parse(it_begin, it_end, "start:" >> -rule1 >> rule2 >> qi::eoi, qi::space, attr))
            std::cout << "match\n";
        else
            std::cout << "not match\n";

        if (it_begin!=it_end)
            std::cout<<"Remaining unparsed input: '" << std::string(it_begin, it_end) << "'\n";
    }
}

Prints

match
match

¹ this assumes that multiple/different whitespace is okay. If newlines should not count as whitespace, use qi::blank instead of qi::space

How to write a boost::spirit::qi parser to do what '?' does in regex?

There are 1 best solutions below

Related Questions in C++

Related Questions in REGEX

Related Questions in BOOST-SPIRIT

Related Questions in BACKTRACKING

Related Questions in QI

Trending Questions

Popular # Hahtags

Popular Questions