Why does PHP strip_tags("is<isvery interesting <thatthis willbe stripped") return is?

1k Views Asked by At

I am about to make a char counting function which counts input from a tinyMce textarea. Server-side validation with code like this:

$string = "is<isvery interesting <thatthis willbe stripped";
$stripped = strip_tags($string);
$count = strlen($stripped); // This will return 2

You might notice that $string has no tag at all, anyway strip_tags() strips everything from the first less-than sign on.

Is this a bug or a feature?

2

There are 2 best solutions below

0
On BEST ANSWER

This has been documented:

Because strip_tags() does not actually validate the HTML, partial or broken tags can result in the removal of more text/data than expected.

http://php.net/manual/en/function.strip-tags.php

3
On

strip_tags is actually quite dumb. It strips everything, that only remotely looks like an HTML tag. That is, starting with < and some alpha-numeric sign until the closing > or as far as it can get.

The observed behavior is in this context a bug. However, strip_tags is then not the tool to do error correction on input HTML. Its purpose is to strip away stuff, so that the remainder is safe to embed in websites. In doubt, it strips more, which is a good thing.