I have websolr setup on my rails app running on heroku. I just noticed that the search for "volcano" did not return all the results I would have expected. Specifically, it did return a result which included both "volcanic" and "stratovolcanoes".
How do I need to modify the solr configuration to address this?
This is the relevant section from my schema.xml
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" />
</analyzer>
</fieldType>
Addition: I don't think this is relevant, but just in case:
My Rails Photo.rb model is setup like this:
searchable do
text :caption, :stored => true
text :category do
category.breadcrumb
end
integer :user_id
integer :category_id
string :caption
string :rights
end
Caption and category are the two text fields I'm searching on. Caption is free-form text, whereas Category is a text string like "Earth Science > Volcanoes"
This is my synonyms config that shows in websolr (I added the last line):
#some test synonym mappings unlikely to appear in real input text
aaa => aaaa
bbb => bbbb1 bbbb2
ccc => cccc1,cccc2
a\=>a => b\=>b
a\,a => b\,b
fooaaa,baraaa,bazaaa
# Some synonym groups specific to this example
GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television, Televisions, TV, TVs
#notice we use "gib" instead of "GiB" so any WordDelimiterFilter coming
#after us won't split it into two words.
# Synonym mappings can be used for spelling correction too
pixima => pixma
volcano => volcanic,stratovolcanoes
I believe this is caused by the introduction of
SnowballPorterFilterFactoryIncluding this in your analyzer lists causes Solr to apply Stemming to your terms. Particularly, in this case Solr does Porter Stemming
If you do not need stemming, you could remove that analyzer.