Retrieving the results from the std::tr1::regex_search

790 Views Asked by At

I have a confusion on how to fetch the result after running the function regex_search in the std::tr1::regex. Following is a sample code to demonstrate my issue.

string source = "abcd 16000 ";
string exp = "abcd ([^\\s]+)";
std::tr1::cmatch res;
std::tr1::regex rx(exp);

while(std::tr1::regex_search(source.c_str(), res, rx,   std::tr1::regex_constants::match_continuous))
{

   //HOW TO FETCH THE RESULT???????????
   std::cout <<" "<< res.str()<<endl;

   source = res.suffix().str();
}

The regular expression mentioned should ideally strip off the "abcd" from the string and return me 16000.

I see that the cmatch res has TWO objects. The second object contains the expected result.(this object has three members (matched, first, second). and the values are {true, "16000", " "}.

My question is what does this size of the object denote? Why is it showing 2 in this specific case( res[0] and res[1]) when I have run regex_search only once? And how do I know which object would have the expected result?

Thanks Sunil

1

There are 1 best solutions below

2
On

As stated here:

match[0]: represents the entire match
match[1]: represents the first match
match[2]: represents the second match, and so forth

This means match[0] should - in this case! - hold your full source (abcd 16000) as you match the whole thing, while match[1] contains the content of your capturing group.
If there was, for example, a second capturing group in your regex you'd get a third object in the match-collection and so on.

I'm a guy who understands visualized problems/solutions better, so let's do this:
See the demo@regex101.

enter image description here

See the two colors in the textfield containing the teststring?
The green color is the background for your capturing group while the
blue color represents everything else generally matched by the expression, but not captured by any group.
In other words: blue+green is the equivalent for match[0] and green for match[1] in your case.

This way you can always know which of the objects in match refers to which capturing group:
You initialize a counter in your head, starting at 0. Now go through the regex from the left to the right, add 1 for each ( and subtract 1 for each ) until you reach the opening bracket of the capturing group you want to extract. The number in your head is the array index.

EDIT
Regarding your comment on checking res[0].first:

The member first of the sub_match class is only

denoting the position of the start of the match.

While second denotes the position of the end of the match.
(taken from boost doc)
Both return a char* (VC++10) or an iterator (Boost), thus you get a substring of the sourcestring as the output (which may be the full source in case the match starts at index zero!).

Consider the following program (VC++10):

#include "stdafx.h"
#include <regex>
#include <iostream>

using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
    string source = "abcdababcdefg";
    string exp = "ab";
    tr1::cmatch res;
    tr1::regex rx(exp);

    tr1::regex_search(source.c_str(), res, rx);

    for (size_t n = 0; n < res.size(); ++n) 
    { 
        std::cout << "submatch[" << n << "]: matched == " << std::boolalpha 
            << res[n].matched << 
            " at position " << res.position(n) << std::endl; 
        std::cout << "  " << res.length(n) 
            << " chars, value == " << res[n] << std::endl; 
    }
    std::cout << std::endl; 

    cout << "res[0].first: " << res[0].first << " - res[0].second: " << res[0].second << std::endl;
    cout << "res[0]: " << res[0];

    cin.get();

    return 0;
}

Execute it and look at the output. The first (and only) match is - obviously - the first to chars ab, so this is actually the whole matched string and the reason why res[0] == "ab".
Now, knowing that .first/.second give us substrings from the start of the match and from the end of the match onwards, the output shouldn't be confusing anymore.