Expanding Solr search: "volcano" to match "volcanic"

184 Views Asked by At

I have websolr setup on my rails app running on heroku. I just noticed that the search for "volcano" did not return all the results I would have expected. Specifically, it did return a result which included both "volcanic" and "stratovolcanoes".

How do I need to modify the solr configuration to address this?

This is the relevant section from my schema.xml

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" />
  </analyzer>
</fieldType>

Addition: I don't think this is relevant, but just in case:

My Rails Photo.rb model is setup like this:

  searchable do
    text :caption, :stored => true
    text :category do
      category.breadcrumb
    end

    integer :user_id
    integer :category_id
    string :caption
    string :rights
  end

Caption and category are the two text fields I'm searching on. Caption is free-form text, whereas Category is a text string like "Earth Science > Volcanoes"

This is my synonyms config that shows in websolr (I added the last line):

#some test synonym mappings unlikely to appear in real input text
aaa => aaaa
bbb => bbbb1 bbbb2
ccc => cccc1,cccc2
a\=>a => b\=>b
a\,a => b\,b
fooaaa,baraaa,bazaaa

# Some synonym groups specific to this example
GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television, Televisions, TV, TVs
#notice we use "gib" instead of "GiB" so any WordDelimiterFilter coming
#after us won't split it into two words.

# Synonym mappings can be used for spelling correction too
pixima => pixma

volcano => volcanic,stratovolcanoes
2

There are 2 best solutions below

1
On

If you do not get desired results for specific cases with stemming, you could add a solr.SynonymFilterFactory filter like descibed here:

<fieldtype name="syn" class="solr.TextField">
  <analyzer>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.SynonymFilterFactory" synonyms="syn.txt" ignoreCase="true" expand="false"/>
  </analyzer>
</fieldtype>

You will then be able to maintain a synonym file:

volcano => volcanic, stratovolcanoes
6
On

I believe this is caused by the introduction of SnowballPorterFilterFactory

Including this in your analyzer lists causes Solr to apply Stemming to your terms. Particularly, in this case Solr does Porter Stemming

If you do not need stemming, you could remove that analyzer.