How can I tokenize CSV file with TokenizeBoost library?

269 Views Asked by At

I have a problem with converting any string from CSV into string (but not string of char) and then tokenize it.

There is my code here:

#include <iostream>
#include <math.h>
#include "NumCpp.hpp"
#include <cstdlib>
#include <python3.10/Python.h>
#include <fstream>      
#include <vector>
#include <string>
#include <algorithm> 
#include <iterator>
#include <boost/tokenizer.hpp>


using namespace std;
using namespace boost;

typedef tokenizer< escaped_list_separator<char> > Tokenizer;
//Take this advice from one site

int main()
{
    string data("DATA.csv");
    ifstream in(data.c_str());

    while (getline(in, line))
    {
        Tokenizer tok(line);
        for (tokenizer<>::iterator beg = tok.begin(); beg != tok.end(); ++beg) {
            cout << *beg << "\n";
        }
    }
    return 0;
}

It's just copy strings from CSV file one by one.

I don't know how to control the tokenize symbol of this function. In official documentation I had only found a little piece of code, which works only with your string variable..

#include<iostream>
#include<boost/tokenizer.hpp>
#include<string>

int main() {
    using namespace std;
    using namespace boost;
    string s = "This is,  a test";
    tokenizer<> tok(s);
    for (tokenizer<>::iterator beg = tok.begin(); beg != tok.end(); ++beg) {
        cout << *beg << "\n";
    }
}

The output from simple_example_1 is: Live

This
is
a
test

I accepting advice from you about different arguments of tokenizer, and how I can solve my tokenize reading from csv.

1

There are 1 best solutions below

0
On

First off, don't (ever) do this:

using namespace std;
using namespace boost;

It leads to at least a dozen name conflicts. In general, avoid using namespace.

The Question

You're using Tokenizer as:

using Tokenizer = boost::tokenizer<boost::escaped_list_separator<char>>;

This means it uses escaped_list_separator as the separator. You can use other than the default constructors to pass initializers to it:

E.g.

    Tokenizer tok(line, {"\\", ",", "\""});

Full sample: Live

#include <iostream>
#include <fstream>      
#include <boost/tokenizer.hpp>

using Sep       = boost::escaped_list_separator<char>;
using Tokenizer = boost::tokenizer<Sep>;

int main() {
    std::ifstream in("DATA.csv");

    std::string line;

    while (getline(in, line)) {
        Tokenizer   tok(line, {"\\", ",", "\""});
        for (auto beg = tok.begin(); beg != tok.end(); ++beg)
            std::cout << *beg << "\n";
    }
}

If you don't want/need the escaping logic, use another separator class, e.g. https://www.boost.org/doc/libs/1_63_0/libs/tokenizer/char_separator.htm

Further Reading

Sometimes tokenizing is just not enough. Consider writing a parser: