Solr WordDelimiterFilterFactory and Period Characters

1.7k Views Asked by At

I am using solr through the sunspot_rails v1.2 gem.

In my schema.xml file, I have the following:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" catenateWords="1" preserveOriginal="1"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="50" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" preserveOriginal="1"/>
  </analyzer>
</fieldType>

If I index the string [email protected], I can find it if I search for example.com, but not if I search for firstname.lastname.

If I remove WordDelimiterFilterFactory from the query settings, then I can the email by searching for firstname.lastname; however, nothing comes up when I search for example.com.

How can I modify the configuration file to be able to search by either of these means?

1

There are 1 best solutions below

0
On

You could debug how your current index & query analysis configuration are affecting your searches, by using Solr Admin Analysis Page, another option is to use Luke to peek into Lucene index.

However, there is an alternative that you could explore. Since, Email & URL needs to be handled in a specific way, Lucene has variant of StandardTokenizer that specifically deals with Email & URL Lucene Email/URL Tokenizer corresponding Solr Email/URL Tokenizer Factory