HTTP error fetching URL. Status=500 while getting data from a website

159 Views Asked by At

I'm trying to get data from a website using Jsoup, the website is taking payload in Json ({"SEARCH_VALUE":"ab","STARTS_WITH_YN":false,"ACTIVE_ONLY_YN":false,"ELECTRONIC_NOTARY_ONLY_YN":false,"REMOTE_NOTARY_ONLY_YN":false}) with post request but it is giving 500 error. I tried to get cookies from the home page which return null ({}).

scala version: "2.13.1"

sbt version: "1.2.8"

jsoup version: "1.15.3"

here is my code in scala

  val homePageUrl = "https://firststop.sos.nd.gov/search/notary"
  val searchPage = "https://firststop.sos.nd.gov/api/Records/notarysearch"
  val jsoup =Jsoup.connect(searchPage)

  val response = jsoup.data("ACTIVE_ONLY_YN","false" )
    .data("SEARCH_VALUE", "ab")
    .data("ELECTRONIC_NOTARY_ONLY_YN", "0")
    .data("REMOTE_NOTARY_ONLY_YN", "false")
    .data("STARTS_WITH_YN", "false")
    .post()
  println(response)

Error:

Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=500, URL=[https://firststop.sos.nd.gov/api/Records/notarysearch]
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:890)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:829)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:366)
    at org.jsoup.helper.HttpConnection.post(HttpConnection.java:360)

What I tried:

I've set timeout and userAgent such as

  val jsoup =Jsoup.connect(searchPageUrl).userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36").timeout(0).ignoreHttpErrors(true).ignoreContentType(true).followRedirects(true)

  val response = jsoup.data("ACTIVE_ONLY_YN","false" )
    .data("SEARCH_VALUE", "ab")
    .data("ELECTRONIC_NOTARY_ONLY_YN", "0")
    .data("REMOTE_NOTARY_ONLY_YN", "false")
    .data("STARTS_WITH_YN", "false")
    .post()

  println(response)

I got same error,

<html>
 <head></head>
 <body>
  {"code":"api/Records","message":"Internal Error Occurred","internalerror":null,"title":""}
 </body>
</html>

Then I tried to set header

  val payload = """{SEARCH_VALUE:"ab",STARTS_WITH_YN:false,ACTIVE_ONLY_YN:false,ELECTRONIC_NOTARY_ONLY_YN:false,REMOTE_NOTARY_ONLY_YN:false}""".stripMargin

   val request: Connection.Response = Jsoup.connect(searchPageUrl)
     .header("Content-Type", "application/json")
     .method(Connection.Method.POST)
     .ignoreContentType(true)
     .requestBody(payload)
     .execute()
   val responseBody = request.body()
   println(responseBody)

but it is still giving 500 error.

I have also tried it with scalaj.http

  val payload = """{SEARCH_VALUE:"ab",STARTS_WITH_YN:false,ACTIVE_ONLY_YN:false,ELECTRONIC_NOTARY_ONLY_YN:false}""".stripMargin

  val response = Http(searchPageUrl).postData(payload).header("Content-Type", "application/json").asString
    println(response)

  val responseBody = response.body

  val json = responseBody.parseJson
  println(json)

I've got error

HttpResponse({"code":"api/Records","message":"Internal Error Occurred","internalerror":null,"title":""},500,TreeMap(Access-Control-Allow-Headers -> Vector(Origin, Content-Type, Accept, Content-Encoding, Authorization), Access-Control-Allow-Methods -> Vector(*), Access-Control-Allow-Origin -> Vector(*), Access-Control-Expose-Headers -> Vector(session-timeout, Request-Context), Cache-Control -> Vector(no-cache), Connection -> Vector(close), Content-Length -> Vector(90), Content-Type -> Vector(application/json; charset=utf-8), Date -> Vector(Wed, 30 Aug 2023 07:29:59 GMT), Expires -> Vector(-1), Pragma -> Vector(no-cache), Request-Context -> Vector(appId=cid-v1:df24017c-37e9-4e1c-afab-260d80eaaeea), Server -> Vector(State of North Dakota), session-timeout -> Vector(0), Set-Cookie -> Vector(ASP.NET_SessionId=j3ioe0kyrrkaqrwlymi0dq0r; path=/; HttpOnly; SameSite=Lax), Status -> Vector(HTTP/1.1 500 Internal Server Error), X-AspNet-Version -> Vector(4.0.30319), X-Content-Type-Options -> Vector(nosniff), X-XSS-Protection -> Vector(1;  mode=block)))
{"code":"api/Records","internalerror":null,"message":"Internal Error Occurred","title":""}

Where I'm doing wrong , is there any other way to do get data from this website ?

1

There are 1 best solutions below

0
On BEST ANSWER

There are several issues that have to be handled here:

  1. There is an authentication cookie in the post request - you must fetch it first and use it later.
  2. The server is very strict and demands that the request will incluce all the right headers as they are in the browser.
  3. The search data should be in a json format.

To get the cookie - open a connection to this url - https://firststop.sos.nd.gov/api/GroupItems/Auth and store the cookie for later usage. Also add ignoreContentType, since it's json and jsoup will not parse it (but anyway you don't need the content).
As for 2 and 3 you can see how I did it in the following (Java) code:

String search_url = "https://firststop.sos.nd.gov/api/Records/notarysearch";
String auth_url = "https://firststop.sos.nd.gov/api/GroupItems/Auth";
try {           
    Connection.Response con = Jsoup.connect(auth_url)
        .ignoreContentType(true)
        .method(Connection.Method.GET)
        .execute();
    System.out.println(con.cookies());
        
    Document doc = Jsoup.connect(search_url)
        .requestBody("{\"SEARCH_VALUE\":\"ab\",\"STARTS_WITH_YN\":false,\"ACTIVE_ONLY_YN\":false,\"ELECTRONIC_NOTARY_ONLY_YN\":false,\"REMOTE_NOTARY_ONLY_YN\":false}")
        .cookies(con.cookies())
        .ignoreContentType(true)
        .header("Host", "firststop.sos.nd.gov")
        .header("Accept" ,"*/*")
        .header("Accept-Language" ,"en-US,en;q=0.5")
        .header("Accept-Encoding" ,"gzip, deflate, br")
        .header("Referer" ,"https://firststop.sos.nd.gov/search/notary")
        .header("authorization" ,"undefined")
        .header("content-type" ,"application/json")
        .header("Content-Length" ,"131")
        .header("Origin" ,"https://firststop.sos.nd.gov")
        .header("DNT" ,"1")
        .header("Connection" ,"keep-alive")
        .header("Sec-Fetch-Dest" ,"empty")
        .header("Sec-Fetch-Mode" ,"cors")
        .header("Sec-Fetch-Site" ,"same-origin")
        .header("Pragma" ,"no-cache")
        .header("Cache-Control" ,"no-cache")
        .userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0")
        .post();
        System.out.println(doc);

    } catch (IOException e) {
        e.printStackTrace();
    }

There is also the content-length header - I've copied its value from the browser, but you will have to write a method that calculates it. Now all you have to do is to parse the output for your needs.