WebClient.DownloadString cancels after a specific char

21 Views Asked by At

I try to download the html from a web page (utf-8 encoding) using WebClient.DownloadString:

using (WebClient client = new WebClient())
{
    client.Proxy = WebRequest.DefaultWebProxy;
    client.Encoding = Encoding.UTF8;
    html = client.DownloadString(url);
}

When doing a Firefox "View Page Source" on the Url it shows the source code which includes the following part somewhere in the middle of the whole html:

enter image description here

The problem now is that the returned html is not complete, it ends with "Sept".

The funny thing is, when I paste the whole source into Notepad++, it also stops at "Sept". When using HEX-View in Notepad++ it does not even show the char.

Does anyone know what char this could be and how I can handle that?

Thank you!

1

There are 1 best solutions below

0
AudioBubble On

Try this method:

 string result = HttpGet("https://en.wikipedia.org/wiki/List_of_Unicode_characters");
private string HttpGet(string url)
{
   var httpRequest = (HttpWebRequest)WebRequest.Create(url);

   httpRequest.Method = "GET";
   httpRequest.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:104.0) Gecko/20100101 Firefox/104.0";
   var httpResponse = (HttpWebResponse)httpRequest.GetResponse();
   if (httpResponse.StatusCode == HttpStatusCode.OK)
   {
      using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
      {
         return streamReader.ReadToEnd();
      }
   }
   else return null;
}

Unfortunately, you did not give a link to the page from which you want to receive data on this matter, I cannot check whether this method will work.