Some sample code. This is the c++11 part using cregex_iterator:
std::chrono::steady_clock::time_point begin0 = std::chrono::steady_clock::now();
regex re("<option[\\s]value[\\s]*=[\\s]*\"([^\">]*)\"[\\s]*[^>]*>", regex::icase);
int found = 0;
for (std::cregex_iterator i = std::cregex_iterator(input, input + input_length, re);
i != std::cregex_iterator();
++i)
{
found++;
if (found < 10000) continue;
break;
}
std::chrono::steady_clock::time_point end0 = std::chrono::steady_clock::now();
This is the pcre part. The regexp is all the same.
std::chrono::steady_clock::time_point begin4 = std::chrono::steady_clock::now();
const char *pError = NULL;
int errOffset;
int options = PCRE_MULTILINE | PCRE_CASELESS;
const char* regexp = "<option[\\s]value[\\s]*=[\\s]*\"([^\">]*)\"[\\s]*[^>]*>";
pcre* pPcre = pcre_compile(regexp, options, &pError, &errOffset, 0);
int offset = 0;
int matches = -1;
int pMatches[6];
while (offset < input_length)
{
matches = pcre_exec(pPcre,NULL, input, input_length, offset,0, pMatches,6);
if (matches >= 1)
{
found++;
offset = pMatches[1];
if (found < 10000) continue;
break; // find match
}
else
offset = input_length;
}
std::chrono::steady_clock::time_point end4 = std::chrono::steady_clock::now();
The result shows pcre is 100 times faster than c++11. I found some vector copy and resize in c++11 implementation. Are there some other reasons?
PCRE benefits from some optimizations known as start-up optimizations which are configured to be enabled by default. These optimizations include:
Superficial pattern analyzing:
Furthermore, if input string doesn't have a special character like
>
, a fast failure is supposed to be thrown. You should know that performance can depend on input string heavily as well.Run below pattern:
over this input string (watch that period):
and compare result (Live demo).