HTML Sanitizer for .NET - How to stop it from removing certain tags and forcing valid html

339 Views Asked by At

My co-worker introduced the HTML Sanitizer for .NET to our API's. I'm speaking of this one here: https://github.com/mganss/HtmlSanitizer

He felt it was best to remove dangerous text/scripts from all input - a sort of wall, as you will, to our database. I've noticed a few things that I'd like to stop it from doing:

1. Imagine this input: Stack>;<Overflow

The Sanitizer will take that and convert it to Stack&gt;;

In other words, it takes <Overflow, and because it sees the < sign (though there's no ending > sign) it treats it like a tag that it should remove. I see that you can configure the Sanitizer with additional "AllowedTags", but it could be anything, not just "Overflow" after the < sign. I can't infinitely add AllowedTags. Can't I get it to just remove tags that are part of a list of dangerous tags, and only remove those?

Of note: Stack>;< Overflow (notice the space) becomes Stack&gt;;&lt; Overflow -- what we want. It's the <Overflow (with the O right up against the < sign that causes issues).

2. Imagine this input: Stack>;<b>

The Sanitizer will take that and convert it to Stack&gt;;<b></b>

It sees the <b> and wants to add an ending </b> to make it valid Html?

But that's not why we want to use this. If the user enters <b> (no spaces), we want it to be entered into the database like this.

0

There are 0 best solutions below