Handling of character references in an embedded SVG's script tags

162 Views Asked by At

This is a xss script:

<svg><script>&#x61;&#x6c;&#x65;&#x72;&#x74;&#x28;&#x31;&#x29;</script></svg>

The code between <script> tags will be translated to alert(1) by the browser and executed.

But if I don't use a <svg> tag the code won't be translated to script. Can anyone tell me why this happens? How does <svg> tag work?

2

There are 2 best solutions below

1
On BEST ANSWER

The use of character references within script tags is explicitly disallowed by the HTML parser according to the HTML 5 specification.

HTML5 has a separate script parsing mode as one of a number of tokenisation modes that vary with context. Script parsing does not allow character references, some of the other parsing modes do.

SVG is based on XML where the rules are much simpler and more straightforward. Basically character references are allowed anywhere because there aren't different context sensitive parsing modes.

For SVG in html, the HTML specification says

The svg element from the SVG namespace falls into the embedded content, phrasing content, and flow content categories for the purposes of the content models in this specification.

In other words, parse all SVG text as phrasing content. All SVG is a single custom tokenisation mode for the HTML 5 parser.

0
On

As I wasn't really satisfied with the other answer's citations on the reasoning behind this behaviour I escalated this 'issue' to the WHATWG mailing list, as it does present some possible (albeit small) security loopholes. To quote Ian Hickson (chief editor of the HTML5 standard at W3C) verbatim:

It's not great, but it is intentional. Within <svg> and <math> blocks, we use the "foreign content" parsing mode wherein parsing is much more similar to legacy XML parsing than legacy HTML parsing:

https://html.spec.whatwg.org/#parsing-main-inforeign

Note in particular that the special behaviour for <script> here doesn't include changing the tokeniser mode, like it would in non-foreign content.

So while Robert's answer is essentially a collection of correct quotes pertaining to standalone HTML5 and SVG content, there is a specific separate section regarding the parsing of 'foreign content' explaining this behaviour. And Ian agrees it's not really a perfect solution, but honestly I can't think of one either that is compatible both with "semi-SGML" and XML parsing.