How to use a slash in Spirit Lex patterns?

Question

How to use a slash in Spirit Lex patterns?

602 Views Asked by user1587451 At 14 July 2015 at 13:47

Code below compiles fine with

clang++ -std=c++11 test.cpp -o test

But when running an exception is thrown

terminate called after throwing an instance of 'boost::lexer::runtime_error' what(): Lookahead ('/') is not supported yet.

The problem is the the slash (/) in input and/or regex (line 12 and 39) but I can't find a solution how to escape it right. Any hints?

#include <string>
#include <cstring>
#include <boost/spirit/include/lex.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace lex        = boost::spirit::lex;
namespace qi         = boost::spirit::qi;
namespace phoenix    = boost::phoenix;

std::string regex("FOO/BAR");

template <typename Type>
struct Lexer : boost::spirit::lex::lexer<Type> {
    Lexer() : foobar_(regex) {
        this->self.add(foobar_);
    }
    boost::spirit::lex::token_def<std::string> foobar_;
};

template <typename Iterator, typename Def>
struct Grammar
  : qi::grammar <Iterator, qi::in_state_skipper<Def> > {
    template <typename Lexer> Grammar(const Lexer & _lexer);
    typedef qi::in_state_skipper<Def> Skipper;
    qi::rule<Iterator, Skipper> rule_;
};
template <typename Iterator, typename Def>
template <typename Lexer>
Grammar<Iterator, Def>::Grammar(const Lexer & _lexer)
  : Grammar::base_type(rule_) {
    rule_ = _lexer.foobar_;
}

int main() {
    // INPUT
    char const * first("FOO/BAR");
    char const * last(first + strlen(first));

    // LEXER
    typedef lex::lexertl::token<const char *> Token;
    typedef lex::lexertl::lexer<Token> Type;
    Lexer<Type> l;

    // GRAMMAR
    typedef Lexer<Type>::iterator_type Iterator;
    typedef Lexer<Type>::lexer_def Def;
    Grammar<Iterator, Def> g(l);

    // PARSE
    bool ok = lex::tokenize_and_phrase_parse (
        first
      , last
      , l
      , g
      , qi::in_state("WS")[l.self]
    );

    // CHECK
    if (!ok || first != last) {
        std::cout << "Failed parsing input file" << std::endl;
        return 1;
    }
    return 0;
}

Original Q&A

There are 1 best solutions below

**Cornstalks** · Answer 1 · 2015-07-14T14:01:33.220000

As sehe points out, / is likely intended to be used as a lookahead operator, likely taking after the syntax of flex. It's unfortunate that Spirit wouldn't use more normal lookahead syntax (not that I think that other syntax is more elegant; it just gets confusing with all the subtle variations in regex syntax).

If you look at re_tokeniser.hpp:

// Not an escape sequence and not inside a string, so
// check for meta characters.
switch (ch_)
{
    ...
    case '/':
        throw runtime_error("Lookahead ('/') is not supported yet.");
        break;
    ...
}

It thinks you're not in an escape sequence nor are you inside a string, so it's checking for meta characters. / is considered a meta character for lookahead (even though the feature isn't implemented), and must be escaped, despite the Boost docs not mentioning that at all.

Try escaping the / (not in the input) with a backslash (i.e. "\\/", or "\/" if using a raw string). Alternatively, others have suggested using [/].

I'd consider this a bug in the Spirit Lex documentation for it lacking to point out that / must be escaped.

Edit: kudos to sehe and cv_and_he, who helped correct some of my earlier thinking. If they post an answer here, be sure to give them a +1.

How to use a slash in Spirit Lex patterns?

There are 1 best solutions below

Related Questions in C++

Related Questions in BOOST

Related Questions in BOOST-SPIRIT

Related Questions in BOOST-SPIRIT-LEX

Trending Questions

Popular # Hahtags

Popular Questions