I'm creating a program to detect plagiarism articles using VB.NET. In the program I created, I use google as a tool to detect it. Example there is a piece of article like this:
Computer is one of today's technology that is quite popular
So the algorithm that I use is, I enter the sentence into a google search by using the first two quotes and the end of the sentence. so i will search on google with keywords like this. "Computer is one of today's technology that is quite popular" If the search google, found there is a website that uses the sentence, it can be ascertained that the article is plagiarism.
However, my program has to check hundreds of articles. So it will automatically open google and search for many keywords in a short time. The program I created, opened google by using webbrowser.
Apparently, because the program I have to check hundreds of articles, then the webbrowser gradually raises an error like this:
Because indeed I'm making a bot for search on google.
Well, is there any suggestion to overcome / outsmart this problem?
This my code : (sorry, comments on my code are in Indonesian)
Dim totallink As String = ""
tempcek = tempstrline.Substring(start, selesai - start)
'cek di google
WebBrowser1.Navigate("https://www.google.co.id/search?q=%22" & tempcek & "%22")
'proses menunggu webbrowser loading
Dim sw As New Stopwatch
sw.Start()
Do Until sw.Elapsed.Seconds = 5
Application.DoEvents()
Loop
sw.Stop()
sw.Reset()
'------
'cek pada source code browser
'pattern : </a></h3><div class="s">
'pattern2 : href="
Dim pattern As String = "</a></h3><div class=" & Chr(34) & "s" & Chr(34) & ">"
Dim pattern2 As String = "href=" & Chr(34)
If WebBrowser1.Document.Body.InnerHtml.Contains(pattern) And Not WebBrowser1.Document.Body.InnerHtml.ToLower.Contains("tidak ditemukan") Then
Dim tempsc As String = WebBrowser1.Document.Body.InnerHtml
'mendapatkan semua link yang ada di pencarian google
While tempsc.IndexOf(pattern) > -1
tempsc = tempsc.Substring(tempsc.IndexOf(pattern) - 300)
Dim templink As String = tempsc.Substring(tempsc.IndexOf(pattern2) + 6)
templink = templink.Substring(0, templink.IndexOf(Chr(34)))
tempsc = tempsc.Substring(350)
webbrowser.contains(pattern) diatas
totallink &= templink & "."
hasil(idxhasil) = totallink & ";" & i & "," & tempcek
End While
Else
'cek selanjutnya dengan kondisi sudah mentok string yang plagiasi
idxhasil += 1
start = tempstrline.IndexOf(" ", selesai) + 1
hitungspasike4(selesai, tempstrline)
End If
'cek selanjutnya dengan kondisi string masih bisa dikembangkan lagi untuk dicek plagiasinya
selesai = tempstrline.IndexOf(" ", selesai + 1)