Is it valid to write '<' and '>' in HTML5 with spaces surrounding them or must they always be written as HTML entities?

116 Views Asked by At

Is the following valid HTML5?

<p>1 < 2</p>
<p>2 > 1</p>

Or must this always be written using HTML5 entities like this?

<p>1 &lt; 2</p>
<p>2 &gt; 1</p>

Can someone help me answer this question with references to the HTML5 specification that clearly spells out whether or not it is valid to write < and > (spaces around the symbols) in HTML?

2

There are 2 best solutions below

5
myf On BEST ANSWER

> in intended text content is and always had been safe and valid in HTML, even without spacing.

< is technically invalid when it does not constitute tag in context where tags are expected. Slightly simplified: When parser encounters it in "Data state", it switches parser to state that either expects valid tag name ("Tag open state") or other markup-related characters (/ for closing tag or ! for either doctype or comment).

Valid HTML tag name must start with letter ("[a-z] case insensitive"), so encountering space character there instead results in Error state: "invalid-first-character-of-tag-name" that instructs parser to handle it so that

such code point and a preceding U+003C (<) is treated as text content, and all content that follows is treated as markup.

So like all other similar syntactic errors in HTML, it has a clear canonical recovery handling that conformant interpreters have to follow. In effect it at the same time produces "invalid" state, but has predictable and standardized outcome as well, so one might consider is 'safe' to exploit: in this case sequence of < , i.e. Bad character after <, rolls back to text content ("Data state"), adds the < and that "bad character" ( ) into its value, and proceeds further. In the end it is displayed the same way as if it was encoded as &lt; .

You can verify that by validating sample document

<!doctype html><html lang="en">

<title>a > b < c</title>

<p>a > b < c</p>

<textarea>a > b < c</textarea>

in validator.w3.org/nu/. It yields:

Error: Bad character  after <.
Probable cause: Unescaped <.
Try escaping it as `&lt;`. At line 5, column 11

N.B. in title and textarea < is OK, since there cannot be any nested non-text nodes (not even comments) because these are (IIUC) specified as Raw Text content.

2
mplungjan On

These are valid

<p>1 &lt; 2</p>
<p>2 &gt; 1</p>
<p>2 > 1</p>

This is TOLERATED in all browsers

<p>1 < 2</p>

but is in principle an invalid-first-character-of-tag-name error

This error occurs if the parser encounters a code point that is not an ASCII alpha where first code point of a start tag name or an end tag name is expected. If a start tag was expected such code point and a preceding U+003C (<) is treated as text content, and all content that follows is treated as markup. Whereas, if an end tag was expected, such code point and all content that follows up to a U+003E (>) code point (if present) or to the end of the input stream is treated as a comment.

The w3org parser will flag the < as invalid.

Here are other related issues

<script>
  const htmlString = `</script>` // this will fail without escaping or using entities
</script>

and

<textarea>
  Here is an end tag: </textarea>
</textarea>

and

<p>
  Here is an end tag: </p>
</p>

Restrictions

13.1.2.6 Restrictions on the contents of raw text and escapable raw text elements The text in raw text and escapable raw text elements must not contain any occurrences of the string </ (U+003C LESS-THAN SIGN, U+002F SOLIDUS) followed by characters that case-insensitively match the tag name of the element followed by one of U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), U+0020 SPACE, U+003E GREATER-THAN SIGN (>), or U+002F SOLIDUS (/).