Isn't UTF-7 possible if htmlspecialchars charset not set, despite the HTTP header charset?

759 Views Asked by At

I want to double check this and believe this will be helpful for others. If someone uses htmlspecialchars($var) in their code and are running a PHP version prior to 5.4, then they're open to utf-7 XSS. That's a given. Am I correct in assuming that the site would still be open to utf-7 XSS even if the header content character set is utf-8, since the server content character set of PHP defaults to iso-8859-1?

Edit: I was asked what I hope to profit out of this. I hope to make sure a project isn't vulnerable to utf-7, since some programmers don't seem inclined to set the third parameter of htmlspecialchars, which is the character set. If you understand the server character set I mentioned and how that fits into utf-7, then I could really use your help.

2

There are 2 best solutions below

1
On BEST ANSWER

Assuming that you are talking about outputting user controlled values to the page then if the HTTP header is set to UTF-8 like so

Content-Type: text/html; charset=utf-8

then XSS cannot be achieved using UTF-7 encodings.

1
On

The charset parameter has no impact on UTF-7 attacks. The byte that has special powers in UTF-7 is 0x2B (ASCII +), and htmlspecialchars() never escapes that.

If you have a user string (in an ASCII-compatible encoding like, say, UTF-8), that you wanted to include on a web page that used the UTF-7 encoding, then you'd have to convert that string using iconv('utf-8', 'utf-7', $str) after calling htmlspecialchars on the UTF-8 string. This charset conversion is a separate operation to HTML-escaping.

In theory you could use htmlspecialchars($s, ENT_xxx, 'utf-7') to HTML-encode a string that was already in UTF-7 encoding, except that, unlike the iconv extension, the native-PHP htmlspecialchars function doesn't support UTF-7.

But the point is moot because modern browsers won't allow you to use UTF-7 and no-one ever deliberately authored a UTF-7 web page.

Real UTF-7 attacks happen not due to missing HTML-encoding, but because a browser treats a page as containing UTF-7 bytes when this was not intended. It's easy to stop that happening, by including an explicit charset declaration, either in the HTTP Content-Type header (as demonstrated by SilverlightFox, +1), or in a <meta> element included in the page before any user content.