Solr search not returing documents

71 Views Asked by At

I am trying to implement PorterStemFilterFactory in my analyzer during indexing .But when i query for documents,the output dont have documents which I got before adding the above analyzer.How can I get documents with both stemming and normal filters.

schema:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
     <analyzer type="index">
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" "/>  
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
 <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

when I search for query "agile" with below analyzer,it returned documents where the query were found.

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
     <analyzer type="index">
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" "/>  
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

Thanks in Advance

2

There are 2 best solutions below

2
On BEST ANSWER

So the PorterStemFilterFactory removes common endings from words.

In your case the word agile is reduced to agil.

You can check here https://tartarus.org/martin/PorterStemmer/voc.txt. (search here for the word agile).

Now search here for the corresponding output after applying Porter Stemming. https://tartarus.org/martin/PorterStemmer/output.txt

You will see you cant find the word agile , because it is stemmed to agil.

That is why you are not able to search for agile, since there is no document that exists with that word . try searching for agil and you should see the results.

1
On

Using "solr.PorterStemFilterFactory" will generate token as agil

I suggest you to use

<filter class="solr.EnglishMinimalStemFilterFactory"/>

post filter agile will be same agile

use filters as per your requirements.