How can I scrap website that use Request Verification Token in request?

71 Views Asked by At

I'm trying to scrap website, but the problem is it requires Request Verification Token in the request.

For this I try to get Request Verification Token from home page of the website - then try to scrap website with that Request Verification Token in the request. I'm using Scala Version 2.13.1, sbt version 1.2.8 and libraries ("org.jsoup" % "jsoup" % "1.15.3", "net.ruippeixotog" %% "scala-scraper" % "3.0.0").

With Akka HTTP Client

  implicit val system: ActorSystem = ActorSystem()
  implicit val executionContext = system.dispatcher

  // Define the URL you want to scrape
  val url = "https://solutions.virginia.gov/Notary/Search/Search"

  // Make a GET request to the URL
  val requestForToken = HttpRequest(HttpMethods.GET, url)
  val responseFutureForToken = Http().singleRequest(requestForToken)

  // Extract the RequestVerificationToken from the response headers
  val tokenFuture = responseFutureForToken.map { response =>
    val tokenHeaders = response.headers.toList
    val setCookieRequestVerificationToken = tokenHeaders(6).toString().split("\\s+|;|=").toList
    setCookieRequestVerificationToken(2)
  }

  // Wait for the token value and print it
  val requestVerificationToken = Await.result(tokenFuture, 5.seconds)
  println(requestVerificationToken)

  val postBody = s"__RequestVerificationToken=$requestVerificationToken&Query.FirstName=ab&Query.LastName=ab&Query.NotaryId="
  val request = HttpRequest(HttpMethods.POST, url, Nil, HttpEntity(ContentTypes.`application/x-www-form-urlencoded`, postBody))
  val responseFuture = Http(system).singleRequest(request)

  responseFuture
    .onComplete {
      case Success(res) => logger.info("Result is: {} ", res)
        Unmarshal(res.entity.toStrict(180 seconds)).value.map { result =>
          val htmlStr = result.data.utf8String
          val browser = JsoupBrowser()
          val doc = browser.parseString(htmlStr)
          println(doc)
        }
      case Failure(e) =>
        println(ServerMessage.EXCEPTION)
    }

Output It is giving:

Request Verification Code: GcgyA-aWo0c0FhTCQUwxd4ne14KPvPbiU6hBNPmbFJHACjnZOmIINyqa2EPwXq_82JbGMttj21V0NIw5yQnSDo4_SjAK-oOlcdCoyDGucSM1
18:04:31.873 521 [default-akka.actor.default-dispatcher-8] LogFactory$ INFO - Result is: HttpResponse(302 Found,List(Cache-Control: private, Location: /Notary/Error/500?aspxerrorpath=/Notary/Search/Search, Server: Microsoft-IIS/10.0, X-AspNetMvc-Version: 5.2, X-Powered-By: ASP.NET, X-Frame-Options: SAMEORIGIN, Date: Fri, 26 May 2023 13:04:30 GMT),HttpEntity.Strict(text/html; charset=UTF-8,170 bytes total),HttpProtocol(HTTP/1.1)) 
JsoupDocument(<html>
 <head>
  <title>Object moved</title>
 </head>
 <body>
  <h2>Object moved to <a href="/Notary/Error/500?aspxerrorpath=/Notary/Search/Search">here</a>.</h2>
 </body>
</html>)

Without Akka HTTP Client

  implicit val system: ActorSystem = ActorSystem()
  implicit val executionContext = system.dispatcher

  // Define the URL you want to scrape
  val url = "https://solutions.virginia.gov/Notary/Search/Search"

  // Make a GET request to the URL
  val requestForToken = HttpRequest(HttpMethods.GET, url)
  val responseFutureForToken = Http().singleRequest(requestForToken)

  // Extract the RequestVerificationToken from the response headers
  val tokenFuture = responseFutureForToken.map { response =>
    val tokenHeaders = response.headers.toList
    val setCookieRequestVerificationToken = tokenHeaders(6).toString().split("\\s+|;|=").toList
    setCookieRequestVerificationToken(2)
  }

  // Wait for the token value and print it
  val requestVerificationToken = Await.result(tokenFuture, 5.seconds)
  println("Request Verification Code: " + requestVerificationToken)

  val formData = Map(
    "__RequestVerificationToken" -> requestVerificationToken,
    "FirstName" -> "ab",
    "LastName" -> "ab",
    "NotaryId" -> "",
  )

  val browser = JsoupBrowser()
  val doc = browser.post(url, formData)
  println(doc)

Output

Request Verification Code: x68MuhunwLyyHNxZjjVJzqf-VAIAiCNtyfNDlBabjyIA7rO1XtURuEfYhiryMOqKnZGfR-oPP4DmzU0Ju3Ed3ULnhtBnLP9GE8tD-MxLNKU1
JsoupDocument(<!doctype html>
<html lang="en">
 <head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <meta name="description" content="Notary, Kay Coles James, Commonwealth, Secretary of the Commonwealth, Glenn Youngkin, governor, virginia, VA">
  <meta name="robots" content="index,follow">
  <meta name="author" content="[email protected]">
  <title>500 - Error</title>

Where I am doing wrong?

Any help is appreciated.

0

There are 0 best solutions below