Solr Search with wrong spell

1.6k Views Asked by At

I have integrated Solr with My eComemrce web application. I am indexing product title and many other fields of Product to Solr. Now I have indexed BLÅBÆRSOMMEREN into product title/name. I have added EdgeNGram as well for Title field. Because of EdgeNGram if I search any of the token I got the result. And Because of spell check if I Search for wrong spell like: BLÅBÆRISOMMEREN, I got the result. But if I search for BLÅBÆRI, I did not get any result as there is not any token for the same.

I want the products in result which have BLÅBÆR because that token is exist. Same for any other wrong spell search.

How can I achieve this? Any help will be appreciated!

Thanks.

2

There are 2 best solutions below

0
Toby Cole On BEST ANSWER

It sounds like you may have Solr's tokenization configured differently for indexing and querying.

So, in your example the following terms may appear in the index:

  • B
  • BL
  • BLÅ
  • BLÅB
  • BLÅBÆ
  • BLÅBÆR
  • BLÅBÆRS

However as your query terms are not being processed into ngrams, you are only searching for

  • BLÅBÆRI

which does not appear within your indexed terms.

This is a common practice when using ngrams, however it sounds like in your use-case you want to return partial matches within your results.

Check your Solr schema to make sure that you have a matching EdgeNGram filter configured for query-time as you do for index-time, e.g.

<fieldType name="text_general_edge_ngram" class="solr.TextField" positionIncrementGap="100">
   <analyzer type="index">
      <tokenizer class="solr.LowerCaseTokenizerFactory"/>
      <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
   </analyzer>
   <analyzer type="query">
      <tokenizer class="solr.LowerCaseTokenizerFactory"/>
      <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
   </analyzer>
</fieldType>

Make sure you're sorting by score though, as this strategy will most likely give you many false-positives!

3
Peter Dixon-Moses On

For misspelled words you can use a fuzzy query (allowing matches on index terms with an edit distance of ~1 or ~2 from the query term).

Using your example, BLÅBÆRISOMMEREN is edit distance 1 (one character difference) from your indexed term.

Therefore the query q=title:BLÅBÆRISOMMEREN~1 will match your title term but BLÅBÆRI will not (without the ngram approach from the previous answer.).

You can also investigate Solr's Suggester component if you're trying to build auto-suggest, as it also can handle fuzzy suggestions like: (BLÅBÆRI -> BLÅBÆRSOMMEREN) and typically responds faster than a traditional query.