Finding any substring of a given length inside another string

50 Views Asked by At

Let assume we have 2 strings ($strA and $strB) of some MBs length and we want to find if there exists any 8-bytes substring of $strA inside $strB. My current approach works however is unacceptable slow:

function FindAnySubstring($strA,$strB)
{
    for( $i=0; $i<strlen($strA)-8; $i++ )
    {
        $toFind=subStr($strA,$i,8);
        $pos=strpos($strB,$toFind);
        if( $pos === FALSE )
            continue;
        return [$i,$pos];
    }
    return FALSE;
}

Can it be done in a better way?

Sample data - Well, probably I'm not able to add big files, but let assume that we can use shorter samples just for visualisation:

$strA = 'abcdefghijKLMNOPQRstuvxyz';
$strB = 'ldsf32vKLMNOPQR4fa156232543';

Expected result: [11,8]

(at 11th pos in $strA and 8th pos in $strB)

Current execution time: For 3,321,792 bytes long $strA and $strB it takes between 0 and 2530 seconds to find a solution (!) - depending on position of the substring. I would expect 2 orders faster algorithm.

Considered ideas - I have some ideas:

  • find shorter substring (substring of the substring, starting from 1 byte) and if found - check the rest; tested with no improvements
  • use pararelling; always work, however harder to code
2

There are 2 best solutions below

1
manout On

There are a few improvements you can make to potentially optimize its performance:

Early Exit: You can exit the loop as soon as a match is found, rather than continuing to search for additional matches. This can significantly reduce the number of unnecessary iterations.

Precompute Lengths: Instead of repeatedly calling strlen($strA) inside the loop, you can precompute the length of $strA before the loop starts to improve efficiency.

Here's an updated version of your function with these improvements:

function FindAnySubstring($strA, $strB)
{
   $lenA = strlen($strA);
   $lenB = strlen($strB);

   for ($i = 0; $i <= $lenA - 8; $i++)
   {
      $toFind = substr($strA, $i, 8);
      $pos = strpos($strB, $toFind);

      if ($pos !== false)
      {
          return [$i, $pos];
      }
   }

  return false;
}
4
AmiR On

I wrote this , Also you can change it to find all substrings :

function FindAnySubstring($strA,$strB) {
    $a2 = str_split($strB);
    foreach (str_split($strA) as $first => $firstw) :
        foreach ($a2 as $second => $secw) :
            if ($first === $secw) :
                return [$firstm+1,$second+1];
            endif;
        endforeach;
    endforeach;
}