We are working on a new feature within Node JS application, where the user can upload any document (pdf, excel, img, etc) and we'll process it in order to show it to other users. Process- I mean to get it's text, generate thumbnail, count pages and more. While doing it we have a very important point to think of - how can we make sure that no attackers are using this feature to atack our site or even our users? While the XSS is very poplar point- I'm sure there are other vulnerabilities we should prevent.
I'm sure this subject is a main point of any site/tool that manipulates UGC, but I was not able to find any standards/cheat-sheats/even a bullets list that describes the process of protect my site while manipulating UGC.
Any reference, link, knowledge share or even an example of another dev language will be appreciate!
Web security has a big scope but you're right, XSS is the most popular but it has a few types:
In my opinion, the third one (Dom-based) is the most actual for your app because people can upload some script that will be located in the IMG SRC and when the user loads this image - the malicious script could steal sensitive data such as Cookies/auth tokens, etc.
The most popular protection, in this case, is just to sanitize the special symbols during the "handling process". Sanitize is when you replace the HTML symbols into HTML entities such as:
<
to<
etc. You can manually create some Regexp to do it but I'd rather suggest using something like that: dompurify it has many useful features such as:and many others
About Web Security whitelists and other possible vulnerabilities. Please, check these ones, I think they might be useful for you
Web Security OWASP cheatsheet - This resource contains a very detailed list with all possible web security rules with examples and solutions
OWASP XSS Filter cheatsheet - The document especially for XSS as the most common one