Does HTML Encoding have any cons?

231 Views Asked by At

I develop a project on ASP.NET MVC framework. All files and charsets are in UTF-8. I'm using model bindings and in some of my models the display property includes some accented chars or single/double quotes.

As Razor engine automatically encodes helpers (ie. DisplayNameFor) the accented chars and quotes are encoded.

I may try to use some custom helpers to achieve rendering without encoding but I would like to learn if HTML encoding has any cons? I'm using UTF-8 encoding and I want to render text "Öger's tours" as it is. However it is rendered as "Öger's tours". I'm asking for this scenario.

(I've heard that search engine indexing performs better without encoded text. But I don't know why.)

Thank you.

2

There are 2 best solutions below

0
On BEST ANSWER

I found the solution as using the AntiXSS library for Razor encoderType. This answer describes it well. Special characters in html output

The default Razor encoder encodes accented chars whereas the AntiXSS library does not encode them. So, accented chars are rendered as they are.

0
On

The only mandatory character to entity encoding is for <, which starts the opening and closing tags of HTML elements, the & character, which otherwise starts an HTML entity, and (within attributes enclosed in double quotes) " to prevent terminating an attribute prematurely. It is also a good idea to use the entity for > to prevent confusing parsers.

For everything else it is absolutely enough to specify the proper charset encoding and properly apply it in the HTML file. There is particularly no need to encode ' outside attribute values enclosed in single quotes or umlauts, ligatures or other non-ASCII characters if the HTML file's charset supports them.