Unexpected case sensitivty

85 Views Asked by At

I am a noob running elastic search 1.5.9. I want to pull out all of the documents that have the field "PERSON" set to "Johnson." (Note the mixed casing). If I manually look at elastic search head, I can see a document with exactly those attributes.

The docs explain that I should construct a filter query to pull out this document. But when I do so, I get some unexpected behavior.

This works. It returns exactly one document w/ Person = "Johnson", as expected

query = {"filter": {"term" : { "PERSON" : "johnson" }}}

But this does not work

query = {"filter": {"term" : { "PERSON" : "Johnson" }}}

If you look closely, you'll see that the good query is lowercase but the bad query is mixed case -- even though the PERSON field is set to "Johnson".

Adding to the weirdness, I am lower casing everything that goes into the full_text field: "_source": { "full_text": "all lower case" So the full text includes johnson -- which I would think would be totally independent from the PERSON field.

What's going on? How do I do a mixed case search on the PERSON field?

1

There are 1 best solutions below

5
On BEST ANSWER

Term query wont analyze your search text. This means you need to analyzed and provide the query in token format for term query to actually work. Use match query instead , things will work like magic.

So when a string like below goes to Elasticsearch , its tokenized ( or rather analyzed) and stored

"Green Apple" -> ( "green" , "apple")

This is the default behavior of analysis. Now when you search using term query , the analysis wont happen. Which means for the word Apple , it searches for the token Apple with case preserved. And hence fails.

For match query , it does do the analysis. Which means if you search with Apple , it converts it to apple and then does the search. Which give good matches.

You can learn more on analysis here.