I'm using boost::spirit lex and qi to parse some source code.
I already skip white spaces from the input string using the lexer. What I would like to do is to switch skipping the comments depending on the context in the parser.
Here is a basic demo. See the comments in Grammar::Grammar() for my problem:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <iostream>
namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
typedef lex::lexertl::token<char const*, boost::mpl::vector<std::string>, boost::mpl::false_ > token_type;
typedef lex::lexertl::actor_lexer<token_type> lexer_type;
struct TokenId
{
enum type
{
INVALID_TOKEN_ID = lex::min_token_id,
COMMENT
};
};
struct Lexer : lex::lexer<lexer_type>
{
public:
lex::token_def<std::string> comment;
lex::token_def<std::string> identifier;
lex::token_def<std::string> lineFeed;
lex::token_def<std::string> space;
Lexer()
{
comment = "\\/\\*.*?\\*\\/|\\/\\/[^\\r\\n]*";
identifier = "[A-Za-z_][A-Za-z0-9_]*";
space = "[\\x20\\t\\f\\v]+";
lineFeed = "(\\r\\n)|\\r|\\n";
this->self = space[lex::_pass = lex::pass_flags::pass_ignore];
this->self += lineFeed[lex::_pass = lex::pass_flags::pass_ignore];
this->self.add
(comment, TokenId::COMMENT)
(identifier)
(';')
;
}
};
typedef Lexer::iterator_type Iterator;
void traceComment(const std::string& content)
{
std::cout << " comment: " << content << std::endl;
}
class Grammar : public qi::grammar<Iterator>
{
typedef token_type skipped_t;
qi::rule<Iterator, qi::unused_type, qi::unused_type> m_start;
qi::rule<Iterator, qi::unused_type, qi::unused_type, skipped_t> m_variable;
qi::rule<Iterator, std::string(), qi::unused_type> m_comment;
public:
Lexer lx;
public:
Grammar() :
Grammar::base_type(m_start)
{
// This does not work (comments are not skipped in m_variable)
m_start = *(
m_comment[phx::bind(&traceComment, qi::_1)]
| qi::skip(qi::token(TokenId::COMMENT))[m_variable]
);
m_variable = lx.identifier >> lx.identifier >> ';';
m_comment = qi::token(TokenId::COMMENT);
/** But this works:
m_start = *(
m_comment[phx::bind(&traceComment, qi::_1)]
| m_variable
);
m_variable = qi::skip(qi::token(TokenId::COMMENT))[lx.identifier >> lx.identifier >> ';'];
m_comment = qi::token(TokenId::COMMENT);
*/
}
};
void test(const char* code)
{
std::cout << code << std::endl;
Grammar parser;
const char* begin = code;
const char* end = code + strlen(code);
tokenize_and_parse(begin, end, parser.lx, parser);
if (begin == end)
std::cout << "-- OK --" << std::endl;
else
std::cout << "-- FAILED --" << std::endl;
std::cout << std::endl;
}
int main(int argc, char* argv[])
{
test("/* kept */ int foo;");
test("int /* ignored */ foo;");
test("int foo /* ignored */;");
test("int foo; // kept");
}
The output is:
/* kept */ int foo;
comment: /* kept */
-- OK --
int /* ignored */ foo;
-- FAILED --
int foo /* ignored */;
-- FAILED --
int foo; // kept
comment: // kept
-- OK --
Is there any issue with skipped_t?
The behavior you are describing is what I would expect from my experience.
When you write
this is essentially the same as writing
(assuming that
ws
is rule with no attribute. If it has an attribute in your grammar, that attribute is ignored, as if usingqi::omit
.)Notably, the skipper does not get propogated inside of the
foo
rule. Sofoo
,bar
, andbaz
can still be whitespace-sensitive in the above. What the skip directive is doing is causing the grammar not to care about leading whitespace in this rule, or whitespace around the','
and'='
in this rule.More info here: http://boost-spirit.com/home/2010/02/24/parsing-skippers-and-skipping-parsers/
Edit:
Also, I don't think the
skipped_t
is doing what you think it is there.When you use a custom skipper, most straightforwardly you specify an actual instance of a parser as the skip parser for that rule. When you use a type instead of an object e.g.
qi::skip(qi::blank_type)
, that is a shorthand, where the tag-typeqi::blank_type
has been linked via prior template declarations to the typeqi::blank
, and qi knows that when it seesqi::blank_type
in certain places that it should instantiate aqi::blank
parser object.I don't see any evidence that you've actually set up that machinery, you've just typedef'ed
skipped_t
totoken_type
. What you should do if you want this to work that way (if it's even possible, I don't know) is read about qi customization points and instead declareqi::skipped_t
as an empty struct which is linked via some template boiler plate to the rulem_comment
, which is presumably what you actually want to be skipping. (If you skip all tokens of all types, then you can't possibly match anything so that wouldn't make sense, so I'm not sure what your intention was with makingtoken_type
the skipper.)My guess is that when
qi
saw that typedeftoken_type
in your parameter list, that it either ignored it or interprets it as part of the return value of the rule or something like this, not sure exactly what it would do.