VB .NET HTMLAgilityPack Colon Separated Values

60 Views Asked by At

Is there a way to get the values within a tag using HTMLAgilityPack? My variable dataNode is an HtmlAgilityPack.HtmlNode and contains:

Dim doc as New HtmlAgilityPack.HtmlDocument()

doc.LoadHtml("
<div id="container" data="id:12,country:usa,city:oregon,id:13,country:usa,city:atlanta">
    <a href="http://www.google.com">Google</a>
</div>
")

Would like to get the value of each id, country,city. They repeat within the tag and have different values.

Dim dataNode as HtmlAgililtyPack.HtmlNode

dataNode = doc.documentNode.SelectSingleNode("//div")
txtbox.text = dataNode.Attributes("id[1]").value

This gives an error System.NullReferenceException

1

There are 1 best solutions below

0
On

You need the "data" attribute, not the "id" attribute.

Once you have the value of the correct attribute, you will need to parse it into some data structure suitable for holding each part of the data, for example:

Option Infer On
Option Strict On

Module Module1

    Public Class LocationDatum
        Property ID As Integer
        Property Country As String
        Property City As String

        Public Overrides Function ToString() As String
            Return $"ID={ID}, Country={Country}, City={City}"
        End Function

    End Class


    Sub Main()
        Dim doc As New HtmlAgilityPack.HtmlDocument()

        doc.LoadHtml("
<div id=""container"" data=""id:12,country:usa,city:oregon,id:13,country:usa,city:atlanta"">
    <a href=""http://www.google.com"">Google</a>
</div>
")

        Dim dataNode = doc.DocumentNode.SelectSingleNode("//div")
        Dim rawData = dataNode.Attributes("data").Value
        Dim dataParts = rawData.Split(","c)

        Dim locationData As New List(Of LocationDatum)

        ' A simple way of parsing the data
        For i = 0 To dataParts.Count - 1 Step 3
            If i + 2 < dataParts.Count Then
                Dim id As Integer = -1
                Dim country As String = ""
                Dim city As String = ""
                ' used to check all three required parts have been found:
                Dim partsFoundFlags = 0
                For j = 0 To 2
                    Dim itemParts = dataParts(i + j).Split(":"c)
                    Select Case itemParts(0)
                        Case "id"
                            id = CInt(itemParts(1))
                            partsFoundFlags = partsFoundFlags Or 1
                        Case "country"
                            country = itemParts(1)
                            partsFoundFlags = partsFoundFlags Or 2
                        Case "city"
                            city = itemParts(1)
                            partsFoundFlags = partsFoundFlags Or 4
                    End Select
                Next
                If partsFoundFlags = 7 Then
                    locationData.Add(New LocationDatum With {.ID = id, .Country = country, .City = city})
                End If
            End If

        Next

        For Each d In locationData
            Console.WriteLine(d)
        Next

        Console.ReadLine()

    End Sub

End Module

Which outputs:

ID=12, Country=usa, City=oregon
ID=13, Country=usa, City=atlanta

It is resistant to some data malformations, such as id/city/country being in a different order, and spurious data at the end.

You would, of course, put the parsing code into its own function.