How to remove all HTML tags with tidy

1.1k Views Asked by antonpuz At 03 December 2014 at 19:39

I searched for a HTML parser and came up with tidy. The thing is that now that I have installed it I can't find how to strip all HTML tags (and also javascript function if its possible). The example code turns html into XHTML and I'm starting to get a feeling that I have downloaded a not suitable package, couldn't find any documantation/manuals that explains it either.

Any suggestions on how this might be done with tidy?

EDIT: As I understood tidy is an HTML parser, what I am trying to achieve is leave only the plain test i.e: <h3>Test</h3> will come up into Test

Original Q&A

There are 1 best solutions below

Rudra Murthy On 11 March 2015 at 14:57

Tidy is basically is used to clean HTML pages. You can send the output of Tidy to libxml++ to parse the generated XHTML.

For a working example on using libxml++, look at this link Parsing a XHTML using libxml++ You can use one of the 3 parsers to parse the string and get only text without any tags.

How to remove all HTML tags with tidy

There are 1 best solutions below

Related Questions in HTML

Related Questions in C++

Related Questions in C

Related Questions in TIDY

Trending Questions

Popular # Hahtags

Popular Questions