Background:
- solr 4.10;
- linux/java - I guess they are irrelevant at this point;
I have this word: ABCDEF
. I need to find all documents who's name (field: NAME
) has a given number of letters in common with ABCDEF
. Example: ABCDEF
should match for a 4 letter match:
- itself (
ABCDEF
,ABCD
,BCDE
, etc); - various permutations:
B..A.DE...
,..F..A.DE...
- where dots are here other letters than the ones inABCDEF
.
I would try to use regex (example: ^(.?)([ABCDEF] (.?)){4}$
), but this will also match A..A..B..C
(A
is there twice) and I don't need this one.
The field is type string
. However, it will not be a problem to add another field to be tokenized and indexed differently.
Also, fuzzy search/match would not be an option - as I need those exact letters to appear in the matched field.
Any idea?
Thanks!
Index each unique letter (or key) as a separate token - you can either split this up in your indexing code or use an update processor to split the field into characters. Be sure to use a field type that doesn't drop short tokens (such as stop words).
When you have a field with each letter / key by itself, use the
mm
parameter to (e)dismax to provide the number of terms that have to match, and provide the letters / keys to search for as separate terms.