DOT NET code to convert HTML to text

2.2k Views Asked by At

I'm creating a little algo to fetch text from web sites.. then find answers (will post the script once completed).

To do that, I need to convert all HTML code within and into plain readable english text.

I've manually removed all html tags, but some css entries are hard to get rid of. Any simple ideas on how to convert html to plain english text?

Thanks.

2

There are 2 best solutions below

2
On BEST ANSWER

some one already made all the work for you.

0
On

I developed something similar avoiding Regex's performance penalty : strip_tags equivalent for ASP.NET (can be run on desktop .NET assemblies too)