I am writing some code to loop through every element in a HTML page and extract all ID and Classes.
My current code is able to extract the ID's but I can't see a way to get the classes, does anybody know where I can access these?
private void ParseElements()
{
// GET: Document from Browser
HtmlDocument ThisDocument = Browser.Document;
// DECLARE: List of IDs
List<string> ListIdentifiers = new List<string>();
// LOOP: Through Each Element
for (int LoopA = 0; LoopA < ThisDocument.All.Count; LoopA += 1)
{
// DETERMINE: Whether ID Exists in Element
if (ThisDocument.All[LoopA].Id != null)
{
// ADD: Identifier to List
ListIdentifiers.Add(ThisDocument.All[LoopA].Id);
}
}
}
You could get the inner HTML of each node and use a regular expression to get the class. Or you could try HTML Agility pack.
Something like...