How to remove all HTML tags with tidy

1.1k Views Asked by At

I searched for a HTML parser and came up with tidy. The thing is that now that I have installed it I can't find how to strip all HTML tags (and also javascript function if its possible). The example code turns html into XHTML and I'm starting to get a feeling that I have downloaded a not suitable package, couldn't find any documantation/manuals that explains it either.

Any suggestions on how this might be done with tidy?

EDIT: As I understood tidy is an HTML parser, what I am trying to achieve is leave only the plain test i.e: <h3>Test</h3> will come up into Test

1

There are 1 best solutions below

0
On

Tidy is basically is used to clean HTML pages. You can send the output of Tidy to libxml++ to parse the generated XHTML.

For a working example on using libxml++, look at this link Parsing a XHTML using libxml++ You can use one of the 3 parsers to parse the string and get only text without any tags.