solr partial word match

1k Views Asked by At

Background:

  • solr 4.10;
  • linux/java - I guess they are irrelevant at this point;

I have this word: ABCDEF. I need to find all documents who's name (field: NAME) has a given number of letters in common with ABCDEF. Example: ABCDEF should match for a 4 letter match:

  • itself (ABCDEF, ABCD, BCDE, etc);
  • various permutations: B..A.DE..., ..F..A.DE... - where dots are here other letters than the ones in ABCDEF.

I would try to use regex (example: ^(.?)([ABCDEF] (.?)){4}$), but this will also match A..A..B..C (A is there twice) and I don't need this one.

The field is type string. However, it will not be a problem to add another field to be tokenized and indexed differently.

Also, fuzzy search/match would not be an option - as I need those exact letters to appear in the matched field.

Any idea?

Thanks!

1

There are 1 best solutions below

0
On

Index each unique letter (or key) as a separate token - you can either split this up in your indexing code or use an update processor to split the field into characters. Be sure to use a field type that doesn't drop short tokens (such as stop words).

When you have a field with each letter / key by itself, use the mm parameter to (e)dismax to provide the number of terms that have to match, and provide the letters / keys to search for as separate terms.

Index: ABCDEF
Document: field: (A, B, C, D, E, F)

Query: BCDF
/select?q=B C D F&mm=4&defType=dismax

Query: BCDF, at least two must match
/select?q=B C D F&mm=2&defType=dismax