Boost Tokenizer: Extra Space?

1.1k Views Asked by At

I am using Boost Tokenizer to remove formatted coordinates, eg (x,y). However, it is adding an additional space after the removal. There are no spaces, but I can't figure out how to get rid of this.

while (std::getline(input, line)) {
    boost::char_separator<char> sep("(),");
    typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
    tokenizer tok(line, sep);
    for (auto &i : tok) {
        _edges.push_back(i);
    }
}

In the vector, the result is the following:

[x][y][space]

2

There are 2 best solutions below

0
On BEST ANSWER

"I can't figure out how to get rid of this."

Once you have fetched a line of text from the file, but before you start parsing the tokens, you can use boost::trim() to remove any leading and trailing whitespace from the fetched line:

std::getline(iss, line);
boost::trim(line);  // <== added
0
On

If the end of the line represented as \r\n (e.g. on a Windows machine), you will have the behavior you mentioned. getline uses \n as the default delimiter.

#include <iostream>
#include <vector>
#include <sstream>
#include <boost/tokenizer.hpp>

int main() {

  std::string line;;
  std::istringstream iss("(1,2)\r\n");

  std::getline(iss, line);
  std::cout << "length: " << line.length() << std::endl; // 6, includes '\r'

  boost::char_separator<char> sep("(),");
  typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
  tokenizer tok(line, sep);
  for (auto &i : tok) {
    std::cout << "ith tok: " << i << std::endl;
  }

  return 0;
}

Prints:

length: 6
ith tok: 1
ith tok: 2
ith tok: 

To resolve the issue, you could change the delimiter or write a method that parses the coordinate from a stream, like this:

#include <iostream>
#include <algorithm>
#include <vector>
#include <sstream>
#include <iterator>

template <typename CharT, typename CharTraits, typename T>
std::basic_istream<CharT, CharTraits>& operator >>(std::basic_istream<CharT, CharTraits>& is, std::vector<T>& v)
{
  typedef typename std::vector<T> vector_type;
  typedef typename vector_type::size_type size_type;

  CharT ch;
  const size_type size = 2;

  vector_type s{0,0};
  if(is >> ch && ch != '(')
  {
    is.putback(ch);
    is.setstate(std::ios_base::failbit);
  }
  else if(!is.fail())
  {
    for(size_type i = 0; i != size; ++i)
    {
      if(is >> std::ws >> s[i] >> ch && ch != ',')
      {
        is.putback(ch);
        if(i < size - 1)
          is.setstate(std::ios_base::failbit);
        break;
      }
    }

    if(is >> ch && ch != ')')
    {
      is.putback(ch);
      is.setstate(std::ios_base::failbit);
    }
  }

  if(!is.fail())
    v.swap(s);

  return is;
}

int main() {

  std::vector<int> v;
  std::istringstream is("(1, 2)\r\n");
  is >> v;

  std::copy(std::begin(v), std::end(v), std::ostream_iterator<int>(std::cout, " "));
  std::cout << std::endl;

  return 0;
}

Prints

1 2

Run online