Solr Spellcheck for Multi Word Phrases

2.1k Views Asked by At

I have a problem with solr spellcheck suggestions for multi word phrases. With the query for 'red chillies'

q=red+chillies&wt=xml&indent=true&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true

I get

<lst name="suggestions">
  <lst name="chillies">
    <int name="numFound">2</int>
    <int name="startOffset">4</int>
    <int name="endOffset">12</int>
    <int name="origFreq">0</int>
    <arr name="suggestion">
      <lst><str name="word">chiller</str><int name="freq">4</int></lst>
      <lst><str name="word">challis</str><int name="freq">2</int></lst>
    </arr>
  </lst>
  <bool name="correctlySpelled">false</bool>
  <str name="collation">red chiller</str>
</lst>

The problem is, even though 'chiller' has 4 results in index, 'red chiller' has none. So we end up suggesting a phrase with 0 result.

What can I do to make spellcheck work on the whole phrase only? I tried using KeywordTokenizerFactory in query:

<fieldType name="text_spell" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory" />
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> 
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory" />
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>

And I also tried adding

<str name="sp.query.extendedResults">false</str>

within

<lst name="spellchecker">

in solrconfig.xml.

But neither seems to make a difference.

What would be the best way to make spellcheck only give collation that have results for the whole phrase? Thanks!

1

There are 1 best solutions below

0
On

The real issue here is that you need to specify the spellcheck.collateParam.q.op=AND and also (optionally) spellcheck.collateParam.mm=100% These params enforce the collate queries executed correctly.

You can read more about this on the solr docs