I'm am currently developing a C# library that uses HTTP requets and a webBrowser control. My library is used in a WinDev program and creates a direct link between the WinDev application and a web plateform (agenda.ch). I needed to use a bit of WebScraping so first started off using the HtmlAgilityPack and that worked just fine, but when running my library on WinDev, the library suddenly stops when the HtmlAgilityPack HtmlDocument is instantiated... I then decided to remove the HtmlAgilityPack and use the System.Windows.Forms HtmlElement class directly in order to retrieve the wanted information.
This is where I get my problem : When using a foreach loop to check each HtmlElement in the document, I can only use the GetAttribute() function in order to check it's class value. But for some reason the value returned is allways empty. I did many different tests and none of them are giving a logic response, that's why I turned to StackOverflow. I tried using another attribute name such as id and that worked fine. I just can't understand why the class attribute value can't be recovered.
private void RecoverClients(HtmlDocument source)
{
HtmlDocument doc = source;
HtmlElementCollection clientSection = doc.GetElementsByTagName("DIV");
HtmlElement clients;
foreach (HtmlElement element in clientSection)
{
// Tests
var test = element.GetAttribute("class"); // Always empty
var test2 = element.GetAttribute("id"); // When has id attribute, works
if (element.GetAttribute("class") == "customer_list") // The code I use
{
clients = element;
break;
}
}
This is a portion of the HTML code that is recovered by the WebBrowser and sent to the RecoverClients function.
<DIV class="customer_list">
<UL>
<LI data-id="xxxx"><
A href="#customers/xxxx" data-action="show">
<STRONG>ClientName</STRONG>ClientSirName<BR><SMALL>[email protected]</SMALL>
</A>
</LI>
<LI data-id="xxxx"><
A href="#customers/xxxx" data-action="show">
<STRONG>ClientName</STRONG>ClientSirName<BR><SMALL>[email protected]</SMALL>
</A>
</LI>
</UL>
</DIV>
Please let me know if you have already run into this kind of problem or if I'm not using the proper technique to recover an HtmlElement with its class name.
Please note that I can't use the HtmlAgilityPack, worked fine before, but causes problems once the library is implemented in WinDev...
After struggling with the same problem, my trial-and-error suggests that while GetAttribute("class") will sometimes return the value of the class attribute, GetAttribute("className") will always do so, so try using that instead.
I know this answer is likely rather outdated, but I just struggled with the same issue for quite some time before stumbling upon this solution. This odd and unpredictable behavior does not seem to be well documented by Microsoft, so hopefully anyone who encounters a similar problem will find this answer helpful in the future.