Antivirus scan speed optimization

2.1k Views Asked by At

I have been developing an antivirus using vb.net. The virus scanner works fine but I was thinking of ways to optimize the scanning speed (because large files take forever).

The algorithm I'm using to detect the viruses is via binary (converted to hex) signatures. I think I don't have to look around the whole file just to find if it's a virus or not, I think there's a specific place and a specific number of bytes that I should scan instead of scanning the whole file. Anyway, if anyone can provide any help in this subject please do so.

Thanks in advance.

BTW the virus signatures come from the hex collection from the clamAv antivirus...

2

There are 2 best solutions below

11
BlueMonkMN On

Perhaps your pattern scan is inefficient. I can scan for a pattern in a 7 MB file in about 1/20th of a second using code like this. Note, if you really want to use code like this, you have to make a correction. You can't always set MatchedLength back to 0 when you realize that you aren't looking at a match, but it does work for this particular pattern. You have to pre-process the pattern so you know what to reset to when you don't find a match, but that will not add significant time to the algorithm. I could make the effort to correctly complete the algorithm, but I won't do that now if your question is just about performance. I'm just demonstrating that it is possible to scan large files quickly if you do it correctly.

Sub Main(ByVal args As String())
  If args.Length < 1 Then Return
  Dim startTime As Long = Stopwatch.GetTimestamp()
  Dim pattern As Byte()
  pattern = System.Text.Encoding.UTF8.GetBytes("SFMB")
  Dim bufferSize As Integer = 4096
  Using reader As New System.IO.FileStream(args(0), IO.FileMode.Open, _
     Security.AccessControl.FileSystemRights.Read, IO.FileShare.Read, bufferSize, IO.FileOptions.SequentialScan)
     Dim buffer(bufferSize - 1) As Byte
     Dim readLength = reader.Read(buffer, 0, bufferSize)
     Dim matchedLength As Integer = 0
     Dim searchPos As Integer = 0
     Dim fileOffset As Integer = 0
     Do While readLength > 0
        For searchPos = 0 To readLength - 1
           If pattern(matchedLength) = buffer(searchPos) Then
              matchedLength += 1
           Else
              matchedLength = 0
           End If
           If matchedLength = pattern.Length Then
              Console.WriteLine("Found pattern at position {0}", fileOffset + searchPos - matchedLength + 1)
              matchedLength = 0
           End If
        Next
        fileOffset += readLength
        readLength = reader.Read(buffer, 0, bufferSize)
     Loop
  End Using
  Dim endTime As Long = Stopwatch.GetTimestamp()
  Console.WriteLine("Search took {0} seconds", (endTime - startTime) / Stopwatch.Frequency)
End Sub

EDIT

Here are some thoughts about how you could match multiple patterns at once. This is just off the top of my head and I have not tried to compile the code:

Create a class to contain information about the status of a pattern:

Class PatternInfo
   Public pattern As Byte()
   Public matchedBytes As integer
End Class

Declare a variable to track all the patterns that you need to check and index them by the first byte of the pattern for quick lookup:

Dim patternIndex As Dictionary(Of Byte, IEnumerable(Of PatternInfo))

Check all the patterns that are currently a potential match to see if the next byte also matches on these patterns; if not, stop looking at that pattern at that position:

Dim activePatterns As New LinkedList(Of PatternInfo)
Dim newPatterns As IEnumerable(Of PatternInfo)

For Each activePattern in activePatterns.ToArray
   If activePattern.pattern(matchedBytes) = buffer(searchPos) Then
      activePattern.matchedBytes += 1
      If activePattern.matchedBytes >= activePattern.pattern.Length Then
         Console.WriteLine("Found pattern at position {0}", searchPos - matchedBytes + 1)
      End If
   Else
      activePatterns.Remove(activePattern)
   End If
Next

See if the current byte looks like the beginning of a new pattern that you would be searching for; if so, add it to the list of active patterns:

If patternIndex.TryGetValue(buffer(searchPos), newPatterns) Then
   For Each newPattern in newPatterns
      activePatterns.Add(New PatternInfo() With { _
         .pattern = newPattern.pattern, .matchedBytes = 1 }
   Next
End If
2
Grijesh Chauhan On

Well it all depends, What is definition of virus signature ?
I Suggest you to parse executable and use only code-section.
But polymorphic virus keeps there malicious code in data-section in encrypted form. So I am not very much sure.
Are you using some kind of n-gram technique? Or just mining frequent Hex-Codes?
Scan time is very important issue!
Once i have written a command line saner, that was able to find a file in less than a second -infect tons of files in a seconds.
The technique was frequent opcode mining.