Borland string::find bug

2.6k Views Asked by At

I'm supporting a C++ application written using Borland C++ Builder 5.02 (from 1997). The find() method on the Borland string class does not behave how I would expect:

#include <cstring>
#include <iostream>

int main (int argc, char *argv[])
{
   string needle = "length == eighteen";
   string haystack = "<" + needle + ">";
   if (haystack.find(needle) != NPOS)
      cout << "Found it!" << endl;
   else
      cout << "Not found" << endl;

   return 0;
}

This program outputs Not found. If I change the needle to something shorter it outputs Found it!. If I exchange the angle brackets for some other characters it finds it. Spaces work, but parentheses also don't.

Note that I am using the Borland string library here: if I #include <string> and use std::string instead then it works exactly how I would expect. Sadly changing the whole application to use STL strings is not a feasible answer!

From the documentation it seems that Borland uses a hash-based algorithm for string search. I can't find any more details about this, and I've stepped through the disassembly but am not much the wiser.

I find it very hard to believe that this is really a bug in the string library, particularly since if it were then I would expect to be able to find an article or something about it. I can't find any such information.

However, I've run out of ideas! Is this a known bug? Is there a fix?

EDIT: Having looked again at the disassembly, I think it's trying to do something like the Rabin-Karp algorithm, where the hash function is calculated mod 33554393 (the largest prime < 2^25). It could well be the polynomial hash function with a base of 32 (i.e. a_0 + 32 a_1 + 32^2 a_2 + .. + 32^n a_n) but that's just a hunch. Sounds like a possible overflow as Daniel Fischer suggested.

3

There are 3 best solutions below

0
On BEST ANSWER

I have found a reference from 1998 suggesting Borland's implementation of searching strings has a bug:

https://groups.google.com/forum/?fromgroups=#!searchin/borland.public.cpp.language/cstring$20bug/borland.public.cpp.language/XBzjaJmCYpk/gtMPm-j8jugJ

Also, it appears that at some point in history the C++ commitee decided that a string class would be part of standard C++, and cstring's string class is a remnant of this:

https://groups.google.com/forum/?fromgroups=#!searchin/borland.public.cpp.language/borland$20cstring/borland.public.cpp.language/2psY2seRmS4/ywVrqwU1C2wJ

2
On

If you have the original BC++ 5.02 installation disk, the string class source is found under BC5\SOURCE\RTL\SOURCE\STRING.

Here is an excerpt from the code of the string::find_case_index() function (called by string::find() ):

const long q = 33554393L;
const long q32 = q<<5;

size_t testlength = length() - startindex;
size_t patternlength = patl = strlen(cp);
if( testlength < patternlength )
    return NPOS;
if( patternlength == 0 )
    return 0;

long patternHash = 0;
long testHash = 0;

const char _FAR *testP = c_str()+startindex;
const char _FAR *patP = cp;
long x = 1;
size_t i = patternlength-1;

while( i-- )
    x = (x<<5)%q;

for( i=0; i<patternlength; i++ )
    {
    patternHash = ( (patternHash<<5) + *patP++  ) % q;
    testHash    = ( (testHash   <<5) + *testP++ ) % q;
    }

testP = c_str()+startindex;
const char _FAR *end = testP + testlength - patternlength;

while (1)
    {

    if(testHash == patternHash)
        if( !get_paranoid_check_flag() ||
            !strncmp( testP, cp, patternlength) )
          return (size_t)(testP-c_str());

    if( testP >= end )
        break;

    // Advance & calculate the new hash value:
    testHash = ( testHash + q32 - *testP * x                 ) % q;
    testHash = ( (testHash<<5)  + *(patternlength + testP++) ) % q;
    }
return NPOS;          // Not found.
4
On

You are not using a Borland string library. String (capital S) is the Borland string class. string (lowercase s), which is the exact same thing as std::string, is the STL string class, which is NOT a Borland implementation (the STL in BCB5 was the RogueWave STL). Your use of #include <cstring> is likely bringing std::string into the global namespace, which is why your code compiles. But you really should be using #include <string> and std::string instead. As for NPOS, you should be using string::npos instead, since that is what string::find() actually returns.

#include <cstring>
#include <iostream>

int main (int argc, char *argv[])
{
   string needle = "length == eighteen";
   string haystack = "<" + needle + ">";
   if (haystack.find(needle) != string::npos)
      cout << "Found it!" << endl;
   else
      cout << "Not found" << endl;

   return 0;
}

Or:

#include <string>
#include <iostream>

int main (int argc, char *argv[])
{
   std::string needle = "length == eighteen";
   std::string haystack = "<" + needle + ">";
   if (haystack.find(needle) != std::string::npos)
      std::cout << "Found it!" << std::endl;
   else
      std::cout << "Not found" << std::endl;

   return 0;
}