I have a byte array(can get very large, past 32 million bytes), and I need to replace some subarrays that with other subarrays of the same length.
My current approach is search the byte array for all the subarrays I need to replace, every time I find one add the index of the subarray to a list, and continue.
My code is as follows. I have a nagging feeling that this cannot be efficient, because 32 million bytes is taking greater than 10 seconds to finish searching and replacing. I pass it 8 strings to replace, so it basically ends up searching for 16 subarrays.
Anyone see either any flaws with my algorithm or a more efficient one?
P.S. I don't actually replace them in this code, just find the indices. My code for that should be perfectly efficient.
public class Search
{
public List<int> positions;
public List<int> lengths;
private List<byte[]> stringsToSearchFor;
public Search(List<string> strings){
stringsToSearchFor = new List<byte[]>();
positions = new List<int>();
lengths = new List<int>();
foreach (string tempString in strings){
stringsToSearchFor.Add(Encoding.ASCII.GetBytes(tempString));
stringsToSearchFor.Add(Encoding.Unicode.GetBytes(tempString));
}
}
public void SearchBytes(byte[] haystack){
int[] arrayOfInt = new int[stringsToSearchFor.Count];
bool[] arrayOfBoolean = new bool[stringsToSearchFor.Count];
for (var i = 0; i < haystack.Length; i++){
byte currentByte = haystack[i];
for (int stringCounter = 0; stringCounter < arrayOfBoolean.Length; stringCounter++)
{
byte[] stringLookFor = stringsToSearchFor.ElementAt(stringCounter);
byte currentStringByte = stringLookFor[arrayOfInt[stringCounter]];
//Saying the current byte is the desired one
if (currentStringByte == currentByte)
{
if (arrayOfInt[stringCounter] + 1 == stringLookFor.Length){
positions.Add(i - stringLookFor.Length + 1);
lengths.Add(stringLookFor.Length);
arrayOfInt[stringCounter] = 0;
}
else
{
arrayOfInt[stringCounter]++;
}
}
else
{
arrayOfInt[stringCounter] = 0;
}
}
}
return;
}
}
You're basically doing a brute force search. Instead of doing brute force, you could do something more akin to the Knuth-Morris-Pratt or Rabin-Karp string searching algorithms (but instead of searching for character sequences in strings, you're searching for byte sequences in arrays).