My co-worker introduced the HTML Sanitizer for .NET to our API's. I'm speaking of this one here: https://github.com/mganss/HtmlSanitizer
He felt it was best to remove dangerous text/scripts from all input - a sort of wall, as you will, to our database. I've noticed a few things that I'd like to stop it from doing:
1. Imagine this input: Stack>;<Overflow
The Sanitizer will take that and convert it to Stack>;
In other words, it takes <Overflow
, and because it sees the <
sign (though there's no ending >
sign) it treats it like a tag that it should remove. I see that you can configure the Sanitizer with additional "AllowedTags", but it could be anything, not just "Overflow" after the <
sign. I can't infinitely add AllowedTags. Can't I get it to just remove tags that are part of a list of dangerous tags, and only remove those?
Of note: Stack>;< Overflow
(notice the space) becomes Stack>;< Overflow
-- what we want. It's the <Overflow
(with the O
right up against the <
sign that causes issues).
2. Imagine this input: Stack>;<b>
The Sanitizer will take that and convert it to Stack>;<b></b>
It sees the <b>
and wants to add an ending </b>
to make it valid Html?
But that's not why we want to use this. If the user enters <b>
(no spaces), we want it to be entered into the database like this.