Download File with JSON response as redirection via WebRequest

843 Views Asked by At

I am struggling to find a way to donwload files via WebRequest.
The API is quite easy. So I have for example following address:
https://eprel.ec.europa.eu/api/products/tyres/381324/labels?format=PDF
In this case a label is shown in the browser.

With /labels I can download a zip package.
With /labels?noDirect=true&format=PDF the response will be a 200 OK with the content containing the URL of the resource (\{address:label or fiche URL}).

With the code:

Dim request As WebRequest = WebRequest.Create("https://eprel.ec.europa.eu/api/products/tyres/381324/labels?noRedirect=true&format=PDF")
request.Credentials = CredentialCache.DefaultCredentials

Dim response As WebResponse = request.GetResponse()
Console.WriteLine(CType(response, HttpWebResponse).StatusDescription)
Console.WriteLine(response)

Using dataStream As Stream = response.GetResponseStream()
    Dim reader As New StreamReader(dataStream)
    Dim responsefromServer As String = reader.ReadToEnd()
    Console.WriteLine(responsefromServer)
End Using

I get the OK response but not the URL for downloading the label.
Also if I use just /labels?format=PDF I am not able to just safe the shown PDF.

I also tried Selenium but this solution is way to slow. So I would prefer to stick with the WebRequest.

Maybe someone can help.

1

There are 1 best solutions below

0
Jimi On

The API has a 2-way response, based on the URI of the request:

Note: this is a binary file, you cannot use a StreamReader to read binary data.


Here two methods that allow to download your PDF files as a byte array.
GetPDFResourceAsync() takes an Uri in the form of a query, gets the response and then call the second method, GetPDFResourceDirectAsync(), passing the WebResponse.ResourceUri it received from the server.

If you want to us the query URI, call GetPDFResourceAsync(), if you want to use the direct resource Uri, just call GetPDFResourceDirectAsync().

The InitializeWebRequest() method is an utility method that initializes the HttpWebRequest. Without it, GetPDFResourceAsync() would not work, since the server expects a User-Agent header set and a Compression method (otherwise you get garbage).

Now, these methods return a byte array (the PDF file data).
You can store it to disc using the File.WriteAllBytes() method. E.g.:

' Indirect method, using a URI query
Dim pdfBytes = Await GetPDFResourceAsync(New Uri("https://eprel.ec.europa.eu/api/products/tyres/381324/labels?format=PDF"))
' Direct method, using a resource URI
Dim pdfBytes = Await GetPDFResourceDirectAsync(New Uri("https://eprel.ec.europa.eu/label/Label_381324.pdf"))

Dim pdfFilePath = Path.Combine("[Some Directory]", "Label381324.pdf")
File.WriteAllBytes(pdfFilePath, pdfBytes)

The worker methods:
In case you don't want / can't use async methods, just remove async and await from everywhere (method names included) and you'll have synchronous code.

Public Async Function GetPDFResourceAsync(resourceUri As Uri) As Task(Of Byte())
    Dim request = WebRequest.CreateHttp(resourceUri)
    InitializeWebRequest(request)
    Using locResponse As HttpWebResponse = DirectCast(Await request.GetResponseAsync(), HttpWebResponse)
        If locResponse.StatusCode = HttpStatusCode.OK Then
            Return Await GetPDFResourceDirect(locResponse.ResponseUri)
        Else
            Return Nothing
        End If
    End Using
End Function

Public Async Function GetPDFResourceDirectAsync(resourceUri As Uri) As Task(Of Byte())
    Dim request = WebRequest.CreateHttp(resourceUri)
    InitializeWebRequest(request)

    Dim buffersize As Integer = 132072
    Dim buffer As Byte() = New Byte(buffersize - 1) {}

    Dim dataResponse = DirectCast(Await request.GetResponseAsync(), HttpWebResponse)
    If dataResponse.StatusCode = HttpStatusCode.OK Then
        Using responseStream As Stream = dataResponse.GetResponseStream(),
            mStream As MemoryStream = New MemoryStream()
            Dim read As Integer = 0
            Do
                read = Await responseStream.ReadAsync(buffer, 0, buffer.Length)
                Await mStream.WriteAsync(buffer, 0, read)
            Loop While read > 0
            Return mStream.ToArray()
        End Using
    End If
    Return Nothing 
End Function

Private Sub InitializeWebRequest(request As HttpWebRequest)
    request.UserAgent = "Mozilla/5.0 (Windows NT 10; WOW64; Trident/7.0; rv:11.0) like Gecko"
    request.AutomaticDecompression = DecompressionMethods.GZip Or DecompressionMethods.Deflate
    request.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip, deflate;q=0.8")
    request.Headers.Add(HttpRequestHeader.CacheControl, "no-cache")
End Sub