I've got this project to scrape data off of the SEC Edgar site. Part of the task is to get the meat of the whole filing, and I was just testing some of that today.
I ran into this somewhat large filing (https://www.sec.gov/Archives/edgar/data/355437/000119312520189547/0001193125-20-189547.txt) that's about 110 meg.
I was breaking up the package to the constituent <DOCUMENT> nodes and processing them differently, based on the FILENAME node value. For the types that were html/xml based, I just used
SgmlReader.ReadInnerXml();
to grab the innards, but on this large filing, it appears to go into this infinite loop. It ran for 15 minutes before I broke in with the debugger, and it was hung on that call.
Has anyone ever run into that before?
I'm using SqmlReader 1.8.16.
I saw a very old comment on a changelog page saying that there was such a bug with improperly terminated html comments but that was listed as fixed a good number of releases ago.
Thanks