Using BOOST Spirit X3 with custom lexer

103 Views Asked by At

How a X3 parser could use an already generated vector of tokens. How the rules could be defined, having something like

enum class token { aa, bb, cc};    
auto rule = token::aa >> token::bb >> -token::cc;
std::vector<token> tokens{token::aa, token:bb, token:cc};
auto ok = parse(tokens.cbegin(), tokens.cend(), rule);

I'm interested to validate the input. The idea is to avoid any lexical analysis (x3::lit, x3::char_, x3::lexeme, x3::alpha, etc.) following the zero-overhead principle of C++.

2

There are 2 best solutions below

1
On BEST ANSWER

Based on @sehe's reply, the implementation could be

#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <cassert>
#include <iostream>
#include <utility>
#include <vector>

namespace my_parser
{
    enum class token
    {
        unknown,
        aa,
        bb,
        cc
    };

    namespace ast
    {
        struct my_seq
        {
            token m1 {};
            token m2 {};
            std::optional<token> m3 {};
        };
    }

    namespace x3 = boost::spirit::x3;

    struct dummy_token_parser: x3::parser<dummy_token_parser>
    {
        using attribute_type = token;
        static const bool has_attribute = true;

        constexpr dummy_token_parser(const token tok): tok(tok) {}

        template <typename It, typename Ctx, typename A>
        bool parse(It& first, const It& last, const Ctx& /*ctx*/, x3::unused_type, A& attr) const
        {
            if (first != last && *first == tok)
            {
                attr = *first;
                ++first;
                return true;
            }

            return false;
        }

        token tok;
    };

    void parse_tokens(const std::vector<token>& input, auto& rule, ast::my_seq& my_seq_data, bool expected_result)
    {
        my_seq_data = {};
        const bool result = x3::parse(input.cbegin(), input.cend(), rule, my_seq_data);
        assert(result == expected_result);
        std::cout << result << std::endl;
    }

    using tp = dummy_token_parser;

    const x3::rule<struct my_seq_rule, ast::my_seq> my_seq {"my_seq"};
    const auto my_seq_def = tp(token::aa) >> tp(token::bb) >> -tp(token::cc);

    BOOST_SPIRIT_DEFINE(my_seq);
}

BOOST_FUSION_ADAPT_STRUCT(my_parser::ast::my_seq, m1, m2, m3);

int main()
{
    using namespace my_parser;

    ast::my_seq my_seq_data {};
    parse_tokens({token::aa, token::bb, token::cc}, my_seq, my_seq_data, true);
    parse_tokens({token::aa, token::cc}, my_seq, my_seq_data, false);
    parse_tokens({token::bb, token::cc}, my_seq, my_seq_data, false);
    parse_tokens({token::aa, token::bb}, my_seq, my_seq_data, true);

    return 0;
}
8
On

X3 doesn't support Lex and probably never will.

From here:

  • using Lex makes most of the sweet-spot disappear since all "highlevel" parsers (like real_parser, [u]int_parser) are out the window. The Spirit devs are on record they prefer not to use Lex. Moreover, Spirit X3 doesn't have Lex support anymore

I quickly searched an original quote for that here: https://sourceforge.net/p/spirit/mailman/spirit-general/thread/CACBJYpn4YvXpKoU68cWZD4PmGG0R4kfe%3DVDu5PuTkA7S79QAoA%40mail.gmail.com/#msg34551953

There are no replacements for the lexer and I doubt X3 will ever have one, unless someone contributes some time and effort. I personally don't really use them, instead preferring on pure Qi. If you do it right, A pure Qi parser can be at par with or even outperform one with a lexer (anecdotal evidence only, not fully substantiated).

The real advantage of X3 over QI is 1) in compile time and 2) in AST building. Parsing should be more or less the same. But, again in my experience, the most time consuming operations are in AST building, not parsing and not lexing.

Apologies in advance to the onslaught of questions and thanks in advance for any insights. I'd love to migrate our parser to X3 as it seems to provide many benefits over Qi.

Again, pardon the delay in replying.

Regards,
--
Joel de Guzman

When pressed, Joel elsewhere suggested to make your own simple token stream if your application really benefits from it, e.g. here https://sourceforge.net/p/spirit/mailman/spirit-general/thread/2A5FC1FD75EA0346A44CF4BD559664CD10CCEC%40SHEX-MB-09.ad.local/#msg34855244