What is the best non-memory-leaking variant type to use for texts in Bison (for example %token <std::string>)?
I wanted to replace char * as the variant type for tokens with a more modern type (mainly to avoid memory leaks). I have tried with three types but still char * is by far the fastest:
char * 2.35 (some metric)
std::string 2.72 (some metric)
std::shared_ptr<std::string> 2.88 (some metric)
When I replaced char * with std::string, performance dropped. I knew it was because Bison's internal stack and the default operation {$$ = $1} in each statement were making redundant copies of the string, as std::string was designed to create a new copy and never share it (as opposed to the new std::string_view). I decided to try wrapping std::string into a light-weight copy operation class such as std::shared_ptr, but to my surprise, std::shared_ptr<std::string> resulted in even lower performance! Now I am completely lost.
I leave down here the files I have used. For the moment I neglected construction time of those three classes, but will only make char * faster compared to the other two.
After compiling them like this I ran it passing a big file and averaging time system execution time:
flex -o parser.cpp parser.l
bison -v -d --output=grammar.cpp grammar.y
g++ parser.cpp grammar.cpp
sharedText.hpp
#pragma once
// choose one option of "using SharedText = ..."
// option (3)
#include <memory>
#include <string>
using SharedText = std::shared_ptr<std::string>;
// option (2)
#include <string>
using SharedText = std::string;
// option (1)
using SharedText = char*;
yylex.hpp
# define YY_DECL yy::grammar::symbol_type yylex()
YY_DECL;
grammar.y
%{
#include "sharedText.hpp"
%}
%require "3.8.2"
%language "c++"
%define api.parser.class {grammar}
%define api.value.type variant
%define api.token.constructor
%code
{
#include "yylex.hpp"
}
%token <SharedText> SYMBOL NUMBER
%start start
%%
start
: symbol_or_number
| symbol_or_number start
;
symbol_or_number
: SYMBOL
| NUMBER
;
%%
namespace yy
{
void grammar::error(const std::string& description)
{
}
}
extern FILE *yyin;
int main(int argc, char *argv[])
{
if (argc != 2)
{
return 1;
}
yy::grammar parse;
yyin = fopen(argv[1], "r");
parse();
return 0;
}
parser.l
%{
#include "sharedText.hpp"
#include "grammar.hpp"
#include "yylex.hpp"
%}
%option noyywrap
%%
<INITIAL>
{
[A-Za-z]+ { return yy::grammar::token::yytokentype::SYMBOL; }
[0-9]+ { return yy::grammar::token::yytokentype::NUMBER; }
"\n" { ++yylineno; }
[ ]+ ;
. ;
}
%%