Variant type without memory leak in Bison tokens

92 Views Asked by At

What is the best non-memory-leaking variant type to use for texts in Bison (for example %token <std::string>)?

I wanted to replace char * as the variant type for tokens with a more modern type (mainly to avoid memory leaks). I have tried with three types but still char * is by far the fastest:

char *                          2.35 (some metric)
std::string                     2.72 (some metric)
std::shared_ptr<std::string>    2.88 (some metric)

When I replaced char * with std::string, performance dropped. I knew it was because Bison's internal stack and the default operation {$$ = $1} in each statement were making redundant copies of the string, as std::string was designed to create a new copy and never share it (as opposed to the new std::string_view). I decided to try wrapping std::string into a light-weight copy operation class such as std::shared_ptr, but to my surprise, std::shared_ptr<std::string> resulted in even lower performance! Now I am completely lost.

I leave down here the files I have used. For the moment I neglected construction time of those three classes, but will only make char * faster compared to the other two.

After compiling them like this I ran it passing a big file and averaging time system execution time:

flex -o parser.cpp parser.l
bison -v -d --output=grammar.cpp grammar.y
g++ parser.cpp grammar.cpp

sharedText.hpp

#pragma once

// choose one option of "using SharedText = ..."

// option (3)
#include <memory>
#include <string>
using SharedText = std::shared_ptr<std::string>;

// option (2)
#include <string>
using SharedText = std::string;

// option (1)
using SharedText = char*;

yylex.hpp

# define YY_DECL        yy::grammar::symbol_type yylex()
YY_DECL;

grammar.y

%{
#include "sharedText.hpp"
%}

%require "3.8.2"
%language "c++"
%define api.parser.class {grammar}
%define api.value.type  variant
%define api.token.constructor

%code
{
#include "yylex.hpp"
}

%token <SharedText> SYMBOL NUMBER

%start start

%%
start
    : symbol_or_number
    | symbol_or_number start
    ;

symbol_or_number
    : SYMBOL
    | NUMBER
    ;

%%

namespace yy
{
void grammar::error(const std::string& description)
{
}
}

extern FILE *yyin;

int main(int argc, char *argv[])
{
    if (argc != 2)
    {
        return 1;
    }

    yy::grammar parse;
    yyin = fopen(argv[1], "r");
    parse();

    return 0;
}

parser.l

%{
#include "sharedText.hpp"
#include "grammar.hpp"
#include "yylex.hpp"
%}

%option noyywrap

%%

<INITIAL>
{
[A-Za-z]+       { return yy::grammar::token::yytokentype::SYMBOL; }
[0-9]+          { return yy::grammar::token::yytokentype::NUMBER; }

"\n"            { ++yylineno; }
[ ]+            ;
.               ;
}
%%
0

There are 0 best solutions below