Background:
- solr 4.10;
- linux/java - I guess they are irrelevant at this point;
I have this word: ABCDEF. I need to find all documents who's name (field: NAME) has a given number of letters in common with ABCDEF. Example: ABCDEF should match for a 4 letter match:
- itself (
ABCDEF,ABCD,BCDE, etc); - various permutations:
B..A.DE...,..F..A.DE...- where dots are here other letters than the ones inABCDEF.
I would try to use regex (example: ^(.?)([ABCDEF] (.?)){4}$), but this will also match A..A..B..C (A is there twice) and I don't need this one.
The field is type string. However, it will not be a problem to add another field to be tokenized and indexed differently.
Also, fuzzy search/match would not be an option - as I need those exact letters to appear in the matched field.
Any idea?
Thanks!
Index each unique letter (or key) as a separate token - you can either split this up in your indexing code or use an update processor to split the field into characters. Be sure to use a field type that doesn't drop short tokens (such as stop words).
When you have a field with each letter / key by itself, use the
mmparameter to (e)dismax to provide the number of terms that have to match, and provide the letters / keys to search for as separate terms.