Filter on solr splitting by list of strings

145 Views Asked by At

I've got this fieldType on my Solr implementation

<fieldType name="suggestion_text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  <filter class="solr.WordDelimiterFilterFactory"
          generateWordParts="1"
          generateNumberParts="1"
          splitOnNumerics="1"
          preserveOriginal="1"
  />
  <filter class="solr.EdgeNGramFilterFactory" maxGramSize="100"/>
  <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

This works fine for almost every model I've got. For example for models AB1234, I can search 1234 and it finds it. But there's a particular case that I want to include and I'm trying to find a better solution than the current one:

Let's say AB is the manufacturer and 1234 is the actual part number, but in my database they are saved as AB1234. It I've got an A0 manufacturer, and A01234 partnumber, with the current implementation if i search 1234 i wont find it.

I found a workaround transforming the EdgeNGramFilterFactory into a NGramFilterFactory, but that's not the solution I want. I want the Solr to be able to search excluding the first two characters if they are letter+number or in the extreme case, but I need it to search with A0 and without A0.

I don't know if I was clear. Anyway i tryied with regular expressions, creating a new field and using this filter on it:

<filter class="solr.PatternReplaceFilterFactory" pattern="(A0)" replacement="" replace="all" />

or

<filter class="solr.PatternReplaceFilterFactory" pattern="[a-zA-Z][0-9]" replacement="" replace="all" />

but this is not giving expected results.

Can you help me? Thank you

1

There are 1 best solutions below

0
On

Please try the below field type for your problem.

<fieldType name="text_en_splitting_test" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" splitOnNumerics="1" preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
    </fieldType>

This worked for below search terms.

  1. AB1234
  2. AB
  3. 1234

Please find the screenshot of the solr analysis page for the suggested field type with the search terms.

Search term AB

Search term 1234