I need to strip tags from user input before saving into DB
I'm well aware of strip_tags method but it also html escapes string, as well as all other recommended methods:
Rails::Html::FullSanitizer.new.sanitize '&'
=> "&"
Rails::Html::WhiteListSanitizer.new.sanitize('&', tags: [])
=> "&"
ActionController::Base.helpers.strip_tags "&"
=> "&"
The string I want to sanitize is NOT to be escaped, it's getting exported via API, used in files etc. it's NOT only outputted via HTML (where also in cases like link_to ActionController::Base.helpers.strip_tags("&") - link_to is double escaping string so you'll get link to & in the frontend )
As a monkey patch I've wrapped strip_tags into CGI.unescapeHTML to get more or less expected result but want to find some straight solution (I'm also afraid what else can strip_tags do and there are too many moving parts for that small functionality - more stuff that can go wrong or become broken)
Real world example:
JPMorgan Chase & Co should become JPMorgan Chase & Co after removing tags
test<script>alert('hacked!');</script>&test should become test&test after stripping tags
And also string:
"test <script>alert('hacked!')</script>"
Should still be
"test <script>alert('hacked!')</script>"
After stripping HTMLs
With alternative solutions that I've found or that was proposed:
> Nokogiri::HTML("test <script>alert('hacked!')</script>").text
=> "test <script>alert('hacked!')</script>"
> Loofah.fragment("test <script>alert('hacked!')</script>").text(encode_special_chars: false)
=> "test <script>alert('hacked!')</script>"
So they're also a no go
You have to parse the HTML and extract the text elements. Use Nokogiri to do that.
Nokogiri is already used by Rails so there's no cost to using it.
You will get all the text, including the content of
<script>tags.You can strip the
<script>tags.But with the tags stripped out the string is of no harm. It might not be worth the effort.
See the Nokogiri tutorials for more.