Using Lucene Fuzzy search with a word that has no aliases

350 Views Asked by At

I wish do searches using fuzzy search. Using Luke to help me, if I search for a word that has aliases (eg similar words) it all works as expected:

Fuzzy query with results

However if I enter a search term that doesn't have any similar words (eg a serial code), the search fails and I get no results, even though it should be valid:

enter image description here

Do I need to structure my search in a different way? Why don't I get the same in the second search as the first, but with only one "term"?

1

There are 1 best solutions below

1
On

You have not specified Lucene version so I would assume you are using 6.x.x. The behavior that you are seeing is a correct behavior of Lucene Fuzzy Search.

Refer this and I quote ,

At most, this query will match terms up to 2 edits.

Which roughly but not very accurately means that two texts varying with maximum of two characters at any positions would be a returned as match if using FuzzyQuery.

Below is a sample output from one of my simple Java programs that I illustrate here,

Lets say three Indexed Docs have a field with values like - "123456787" , "123456788" , "123456789" ( Appended 7 , 8 and 9 to – 12345678 )

Results :

No Hits Found for search string -> 123456 ( Edit distance = 3 , last 3 digits are missing)

3 Docs found !! for Search String -> 1234567 ( Edit distance = 2 )

3 Docs found !! for Search String -> 12345678 ( Edit distance = 1 )

1 Docs found !! for Search String -> 1236787 ( Edit distance = 2 for found one, missing 4 , 5 and last digit for remaining two documents)

No Hits Found for search string -> 123678789 ( Edit distance = 4 , missing 4 , 5 and last two digits)

So you should read more about Edit Distance.

If your requirement is to match N-Continuous characters without worrying about edit distance , then N-Gram Indexing using NGramTokenizer is the way to go.

See this too for more about N-Gram