I'm a bit puzzled by the way term queries work on text fields (I don't even know if it's ok to use them on text fields).
This is my index using standard analyzer:
{
"my-index-000001" : {
"mappings" : {
"properties" : {
"city" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
}
}
}
}
}
And this is the data it has so far:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"city" : "New York"
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"city" : "York"
}
}
]
}
}
Using this query matches both documents in the index:
GET my-index-000001/_search
{
"from":0,"size":20,"timeout":"20s",
"query": {
"wildcard": {
"city": {
"value": "yor*"
}
}
}
}
As you can see, the casing from the query doesn't match any of existing documents (both documents contain York). Also if query for "yOR*" still both documents get matched. When I query for field "city.raw", which is a keyword field, there will be no match.
According to docs , term-level queries should not analyze the search terms which seems to not be true if the field type is text. Is this intended or a bug? Is it safe to use term queries on text fields? (if not safe, why ?)
Thank you.
When you have a field of the
"keyword"
type, the text is indexed as it is in Elasticsearch rather than being analyzed at index time.For example :
"New York"
is stored as "New York"When the field is of
text
type, the text is analyzed at the index time itself, and stored in Elasticsearch.For example:
"New York"
is broken down into"new"
and"york"
As a result, you will find the results while searching for
"yor*"
in the "city" field.It is mentioned in the documentation also that term-level queries work on the text that is stored in Elasticsearch and does not perform any search time analysis.
However it is best to use term level queries with
keyword
type fields