Highlighting fields containing HTML

2.3k Views Asked by At

I have a field that might contain HTML code as a user input. If I use simple highlighter, it does not escape the input before adding the <em> tag. E.g. if the input is

"This is a <caption>"

and I search for "caption", I get:

"This is a <<em>caption</em>>"

But I want to get:

"This is a &lt;<em>caption</em>&gt;"

Which will look the same as the input with the matched word highlighted, when rendered as HTML.

3

There are 3 best solutions below

1
On BEST ANSWER

One technique is to use some other sentinel string to indicate highlighting. See hl.simple.pre and hl.simple.post. That way you can perform escaping first, without losing your highlighting, and then replace the sentinels with highlighting markup as a final step.

For example, the Sunspot Solr client for Ruby uses @@@hl@@@ for the hl.simple.pre param, and @@@endhl@@@ for the hl.simple.post param. Using these values…

  • Solr returns: This is a <@@@hl@@@caption@@@endhl@@@>
  • HTML escaping: This is a &lt;@@@hl@@@caption@@@endhl@@@&gt;
  • Replace the sentinels: This is a &lt;<em>caption</em>&gt;
1
On

You can use String.replace to replace "<<" with "&lt;<" and ">>" with ">&gt;". If you want any more specific replacements you can specify them also

0
On

Solr 4.3.1 has an option to enable a specific encoder for the higlighting to produce XML/HTML escaped snippets. Put

<str name="hl.encoder">html</str> 

below /config/requestHandler[@name="/select"]/lst[@name="defaults"] in solrconfig.xml. The parameter can also be set in the url by &hl.encoder=html. The standard solrconfig.xml contains a definition for this encoder

<!-- Configure the standard encoder -->
<encoder name="html" class="solr.highlight.HtmlEncoder" />

Example: "X < Y < Z" will be highlighted as

X &lt; <em>Y</em> &lt; Z

when searching for "Y". The Solr XML-response contains

X &amp;lt; &lt;em&gt;Y&lt;/em&gt; &amp;lt; Z

in the str-element, of course.