Prevent escaped_list_separator from consuming quotes in quoted token

116 Views Asked by At

Is it possible to prevent boost's escaped_list_separator from consuming quotes in a quoted token? Or are there any other ready-to-use constructs to archive this behavior?
The inner quotes cannot be escaped as the grammar doesn't support that and is defined by a third party.

Example:

std::string input("ID=abcde;PARAM={this;{is};quoted}");
boost::escaped_list_separator<char> separator("", ";", "{}");
boost::tokenizer<boost::escaped_list_separator<char>> tokenizer(input, separator);

for(const auto &token : tokenizer)
{
    std::cout << token << std::endl;
}

This yields

ID=abcde
PARAM=this;is;quoted

but I need

ID=abcde
PARAM=this;{is};quoted
1

There are 1 best solutions below

15
On

UPDATE Given the context of MSODBC connection strings, see update below

Don't tokenize if you want to parse.

I'll make some assumptions:

  • you want to parse into a map of key/value pairs (like {"ID","abcde"})
  • the nested {} braces are not to be ignored, but must be balanced (in that respect it's weird that they're not interpreted, but maybe you're just not showing the real purpose of the code)

Example: Spirit X3

Live On Compiler Explorer

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/fusion/adapted.hpp>  // for std::pair support
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <map>

using Map = std::map<std::string, std::string>;
using Entry = std::pair<std::string, std::string>;

namespace Grammar {
  using namespace boost::spirit::x3;

  auto entry  = rule<struct Entry_, Entry>{"entry"};
  auto quoted = rule<struct Quoted_, std::string>{"quoted"};

  auto key        = +~char_("=;");
  auto quoted_def = '{' >> raw[ *(quoted | +~char_("{}")) ] >> '}';
  auto raw        = *~char_(";");

  auto value      = quoted | raw;
  auto entry_def  = key >> '=' >> value;

  BOOST_SPIRIT_DEFINE(quoted, entry)
   
  auto full = entry % ';' >> eoi;
};

Map parse_map(std::string_view sv) {
  Map m;

  if (!parse(sv.begin(), sv.end(), Grammar::full, m))
    throw std::runtime_error("Parse error");

  return m;
}

#include <fmt/ranges.h>
int main() {
  auto m = parse_map("ID=abcde;PARAM={this;{is};quoted}");
  fmt::print("Result: {}\n", m);
}

Prints

Result: {"ID": "abcde", "PARAM": "this;{is};quoted"}

UPDATE: MSODBC Connection Strings

Going from the scant documentation here:

Applications do not have to add braces around the attribute value after the Driver keyword unless the attribute contains a semicolon (;), in which case the braces are required. If the attribute value that the driver receives includes braces, the driver should not remove them but they should be part of the returned connection string.

A DSN or connection string value enclosed with braces ({}) that contains any of the characters []{}(),;?*=!@ is passed intact to the driver. However, when you use these characters in a keyword, the Driver Manager returns an error when you work with file DSNs, but passes the connection string to the driver for regular connection strings. Avoid using embedded braces in a keyword value.

It follows that a braced value is only ended by } if it appears right before ; or at the end of the connection string, so basically:

auto braced  = '{'  >> *(char_ - ('}' >> (eoi | ';'))) >> '}';

To also retain the original bracing status (so the highlighted requirement can be met) I'd do this:

Live On Compiler Explorer

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <cstdio>
#include <iostream>
#include <map>

struct Value {
    bool braced;
    std::string value;
};
using Map = std::map<std::string, Value>;

BOOST_FUSION_ADAPT_STRUCT(Value, braced, value)

namespace Grammar {
  using namespace boost::spirit::x3;

  // only to coerce attribute type, no rule recursion needed anymore:
  template <typename T>
  auto as = [](auto p) { return rule<struct _, T>{"as"} = p; };

  auto key     = +~char_("=;");
  auto braced  = '{'  >> *(char_ - ('}' >> (eoi | ';'))) >> '}';
  auto raw     = *~char_(";");
  auto value   = as<Value>(matches[&lit('{')] >> (braced | raw));
  auto entry   = key >> '=' >> value;
  auto connstr = -entry % ';' >> eoi;
} // namespace Grammar

Map parseConnectionString(std::string_view sv) {
  Map m;

  if (!parse(sv.begin(), sv.end(), Grammar::connstr, m))
    throw std::runtime_error("Parse error");

  return m;
}

#include <iostream>
int main() {
    for (
        auto connectionString : {
            R"(DSN=dsnname)",
            R"(Driver={Microsoft Access Driver (*.mdb)};DBQ=c:\bin\Northwind.mdb)",
            R"(Driver={Microsoft Excel Driver (*.xls)};DBQ=c:\bin\book1.xls)",
            R"(Driver={Microsoft ODBC for Oracle};Server=ORACLE8i7;Persist Security Info=False;Trusted_Connection=Yes)",
            R"(Driver={Microsoft Text Driver (*.txt; *.csv)};DBQ=c:\bin)",
            R"(Driver={SQL Server};Server=(local);Trusted_Connection=Yes;Database=AdventureWorks;)",
            R"(ID=abcde;PARAM={this;{is;quoted})",
            R"(ID=abcde;PARAM={this;{i}s;}s;quoted})", // all fine even if unbalanced
            //
            R"(ID=abcde;PARAM={this;{is}};quoted})", // parse error because of early };
        })
    try {
        std::cout << connectionString << std::endl;
        for (auto& [k, v] : parseConnectionString(connectionString))
        {
            std::cout << " -> " << k << ": " << v.value << ""
                      << (v.braced ? " (braced)" : " (raw)") << std::endl;
        }
    } catch(std::exception const& e) {
        std::cout << " -> " << e.what() << std::endl;
    }
}

Which prints the expected outcome:

DSN=dsnname
 -> DSN: dsnname (raw)
Driver={Microsoft Access Driver (*.mdb)};DBQ=c:\bin\Northwind.mdb
 -> DBQ: c:\bin\Northwind.mdb (raw)
 -> Driver: Microsoft Access Driver (*.mdb) (braced)
Driver={Microsoft Excel Driver (*.xls)};DBQ=c:\bin\book1.xls
 -> DBQ: c:\bin\book1.xls (raw)
 -> Driver: Microsoft Excel Driver (*.xls) (braced)
Driver={Microsoft ODBC for Oracle};Server=ORACLE8i7;Persist Security Info=False;Trusted_Connection=Yes
 -> Driver: Microsoft ODBC for Oracle (braced)
 -> Persist Security Info: False (raw)
 -> Server: ORACLE8i7 (raw)
 -> Trusted_Connection: Yes (raw)
Driver={Microsoft Text Driver (*.txt; *.csv)};DBQ=c:\bin
 -> DBQ: c:\bin (raw)
 -> Driver: Microsoft Text Driver (*.txt; *.csv) (braced)
Driver={SQL Server};Server=(local);Trusted_Connection=Yes;Database=AdventureWorks;
 -> Database: AdventureWorks (raw)
 -> Driver: SQL Server (braced)
 -> Server: (local) (raw)
 -> Trusted_Connection: Yes (raw)
ID=abcde;PARAM={this;{is;quoted}
 -> ID: abcde (raw)
 -> PARAM: this;{is;quoted (braced)
ID=abcde;PARAM={this;{i}s;}s;quoted}
 -> ID: abcde (raw)
 -> PARAM: this;{i}s;}s;quoted (braced)
ID=abcde;PARAM={this;{is}};quoted}
 -> Parse error