C# How to iterate in all TR & TD of multiple html table and extract value

28 Views Asked by At

I want to parse multiple html table using htmlagility pack. i want to iterate in all TD and extract values.

please have a look at this url https://www.sec.gov/Archives/edgar/data/1108134/000110813423000018/bhlb-20230630.htm

the above url has many tabular data whose html tables i have to parse.

if any one inpect html table code then notice some time TD has direct value. some time TD has SPAN tag which has value and some time SPAN has a custom tag <ix:nonfraction> which has value.

I tried this code to iterate in all tr and td but no luck. my objective is to iterate in all html tables and extract values from TD.

This code i tried

var table1 = htmlDoc.DocumentNode.SelectNodes("//table");
        var tbody = table1.ChildNodes["tbody"];
        var lst = new List<Table1>();
        foreach (var row in tbody.ChildNodes.Where(r => r.Name == "tr"))
        {
            var tbl1 = new Table1();
            var columnsArray = row.ChildNodes.Where(c => c.Name == "td").ToArray();
            for (int i = 0; i < columnsArray.Length; i++)
            {
                if (i == 0)
                    tbl1.Course = columnsArray[i].InnerText.Trim();
                if (i == 1)
                    tbl1.Count = columnsArray[i].InnerText.Trim();
                if (i == 2)
                    tbl1.Correct = columnsArray[i].InnerText.Trim();
            }
            lst.Add(tbl1);
        }

Any kind of help or direction would be a great assitance. Thanks

0

There are 0 best solutions below