I am struggling to find a way to donwload files via WebRequest.
The API is quite easy. So I have for example following address:
https://eprel.ec.europa.eu/api/products/tyres/381324/labels?format=PDF
In this case a label is shown in the browser.
With /labels I can download a zip package.
With /labels?noDirect=true&format=PDF the response will be a 200 OK with the content containing the URL of the resource (\{address:label or fiche URL}).
With the code:
Dim request As WebRequest = WebRequest.Create("https://eprel.ec.europa.eu/api/products/tyres/381324/labels?noRedirect=true&format=PDF")
request.Credentials = CredentialCache.DefaultCredentials
Dim response As WebResponse = request.GetResponse()
Console.WriteLine(CType(response, HttpWebResponse).StatusDescription)
Console.WriteLine(response)
Using dataStream As Stream = response.GetResponseStream()
Dim reader As New StreamReader(dataStream)
Dim responsefromServer As String = reader.ReadToEnd()
Console.WriteLine(responsefromServer)
End Using
I get the OK response but not the URL for downloading the label.
Also if I use just /labels?format=PDF I am not able to just safe the shown PDF.
I also tried Selenium but this solution is way to slow. So I would prefer to stick with the WebRequest.
Maybe someone can help.
The API has a 2-way response, based on the URI of the request:
https://eprel.ec.europa.eu/api/products/tyres/381324/labels?format=PDF
the response is a JSON that specifies a URL fragment that replaces the query, to build a new URI to the direct resource. But it also changes the
WebResponse.ResourceUriaddress to this exact location. Don't use the?noDirect=trueoption.the ResponseStream will contain the data to download.
Note: this is a binary file, you cannot use a
StreamReaderto read binary data.Here two methods that allow to download your PDF files as a byte array.
GetPDFResourceAsync()takes an Uri in the form of a query, gets the response and then call the second method,GetPDFResourceDirectAsync(), passing theWebResponse.ResourceUriit received from the server.If you want to us the query URI, call
GetPDFResourceAsync(), if you want to use the direct resource Uri, just callGetPDFResourceDirectAsync().The
InitializeWebRequest()method is an utility method that initializes the HttpWebRequest. Without it,GetPDFResourceAsync()would not work, since the server expects aUser-Agentheader set and a Compression method (otherwise you get garbage).Now, these methods return a byte array (the PDF file data).
You can store it to disc using the File.WriteAllBytes() method. E.g.:
The worker methods:
In case you don't want / can't use
asyncmethods, just removeasyncandawaitfrom everywhere (method names included) and you'll have synchronous code.