What to do with ASCII escape characters in user-generated markup?

494 Views Asked by At

I'm using HTML Purifier, a PHP "filter that guards against XSS and ensures standards-compliant output," to sanitize/standardize user-inputted markup.

This is an example of the user-inputted markup:

<font face="'Times New Roman', Times">TEST</font>

which generates:

<span style="font-family:&quot;Times New Roman&quot;, Times;">TEST</span>

I'm a bit confused, because &quot isn't even the escape char for a single quote. What's the best practice here since I'm going to be using this user generated content later?

  • Leave as is
  • Replace all &quot with \' after purifier executes
  • Configure HTML Purifier differently
  • Something else?
2

There are 2 best solutions below

1
On BEST ANSWER

Looks okay to me.

I think the conversion from a single to a double quote comes from the fact that HTML purifier takes apart the entire tag, and puts it back together according to its own rules, which happen to use double quotes when quoting stuff inside a style attribute.

It also validates fine for me. What doctype are you validating against?

If I'm not overlooking something, I'd say this is fine to use as is.

3
On

The output is XHTML-valid but the entity conversion is wrong. <img src="/test" alt="I'm ok"/> would get converted to <img src="/test" alt="I&quot;m ok">

A simple will suffice:

$allowed_tags='<font>';
echo htmlspecialchars(strip_tags(rawurldecode($input),$allowed_tags),ENT_COMPAT,'UTF-8');

but it won't convert the <font> tag to <span>.