In our enterprise software we allow customers to supply their own HTML to customize the ‘Contact’ and ‘Legal’ pages of the app. Its a nice feature, but as our app knows nothing specific about the app which actually provides the HTML, I was wondering how I would approach such a problem. I have read some blog articles, SO posts and watched some videos but those only explain the danger of HTML injection or how to do it with createElement or innerHTML or other direct approaches.
I am looking for the most safe approach to displaying HTML I have no direct control over. Any article or library would be greatly appreciated.
You should sanitize the HTML code entered by your users.
One way to do it is by defining a whitelist: you define a list of tags (and for each one of them you can also define a list of allowed attributes) which are allowed to be in the output HTML. Every other tag which is not explicitly allowed will be removed by the sanitization.
There are plenty of whitelist already available out there. You can use the one for the TinyMCE HTML editor for example (see here):
This policy is designed to sanitize the HTML entered in an HTML editor, so it may fit your needs. You can also find more policies here.
If you use Java, you can implement that policy through the java-html-sanitizer, or you can also define a custom one.