Loop child nodes inside parent in AngleSharp c# or vb

1.2k Views Asked by At

I am using AngleSharp to parse a html string.

I can get it to parse an out div that contains some inner divs / h3 tags i want to extract, but I can't work out how to do

I have so far

    Dim outer_linq = document.All.Where(Function(w) w.LocalName = "div" AndAlso w.ClassList.Contains("the-product"))
    For Each item In outer_linq

        If item.LocalName = "h1" AndAlso item.ClassList.Contains("the-product-title") Then
            ' Found h1.the-product-title, so do something with it here
        End If

        If item.LocalName = "div" AndAlso item.ClassList.Contains("price") Then
            ' Found div.price, so do something with it here
        End If

    Next

So it is finding everyting inside of div.the-product, but how do i look through div.the-product and get the h1.the-product-title and div.price for each set in div.the-product ?

There are several div.the-product and each one contains a h1.the-product-title and div.price

Using VB but c# would be ok too.

Thanks if anyone can help.

1

There are 1 best solutions below

0
On

While you can leverage techniques such as LINQ in AngleSharp we encourage everyone to use the DOM (Document Object Model) as much as possible.

Instead of using document.All.Where you should just use document.QuerySelectorAll:

document.QuerySelectorAll("div.the-product")

You could even then perform the nesting directly, e.g.,

document.QuerySelectorAll("div.the-product h1.the-product-title")

would find all h1 elements having a the-product-title class and being below (descendants) of div elements having a the-product class. If you want to have children (instead of descendants) just use the > operator:

document.QuerySelectorAll("div.the-product > h1.the-product-title")

What your code above makes wrong is that you use item again. All retrieved items are actually already div elements (that's what you iterate over), so they cannot be h1 elements, too.

An easy fix of your code above is that you use the outer loop as written above, but within you would write, e.g.,

Dim allInnerH1 = item.QuerySelectorAll("h1.the-product-title")
Dim allInnerPrices = item.QuerySelectorAll("div.price")