I want to create a system that allows users to write articles. Each section of the article would be stored in a database. My concern is XSS and SQL injection, but mostly XSS.
This won't be a public thing, it'd just be for people who I trust to create content. However, I'd rather not leave a gaping hole in my security just because I trust people.
Now, I'd like to allow my users to use HTML in their articles, so I can't just remove every tag. It'd be nice to let them use JavaScript, but I'm willing to remove tags if I have to. If I do that, I'd also have to remove stuff like onclick="do something malicious".
I'm using parameterized queries, so I don't think I have to worry about SQL injection that much, but I'm not sure.
I guess my question is, how far should I go with sanitizing my input and how should I do it?
Edit: I think I'd be fine with just removing JavaScript. Any special effects that users need can be handled with built-in JavaScript. It looks like I can use HTML Purifier to strip JavaScript, but it's a pretty big library and I think I might be able to write a smaller, more specific solution. I'd just go through each attribute and check it against a white list. Would that work to prevent XSS?
If you're using an WYSIWYG HTML editor such as TinyMCE, there are built in functions to strip potentially malicious code, such as
<script>
tags and so on, you can check the function that instantiates the editor for detail of banned tags, functions and so on.Note: TinyMCE should not be used for sanitizing as it is a client-side editor, and sanitizing should only be done server-side.