Problems with elasticsearch where '*' is the field

326 Views Asked by At

So, I should prefix this by saying that I understand * is a special character that should be escaped for elasticsearch queries. Here's the setup and trouble I'm facing. The basic problem boils down to that I'm unable to search fields containing only '*'.

curl -XPUT 'http://localhost:9200/test_index/test_item/1' -d '{
    "some_text" : "*"
}'
curl -XPUT 'http://localhost:9200/test_index/test_item/2' -d '{
    "some_text" : "1+*"
}'
curl -XPUT 'http://localhost:9200/test_index/test_item/3' -d '{
    "some_text" : "asterisk"
}'

curl -XGET 'http://localhost:9200/test_index/_search?q=some_text:*'

Results:
"hits":{"total":2,"max_score":1.0,"hits":[
    "_source":{"some_text" : "1+*"},
    "_source":{"some_text" : "asterisk"}
]


curl -XGET 'http://localhost:9200/test_index/_search?q=some_text:\*'

Results:
"hits":{"total":0,"max_score":null,"hits":[]}

Using python elasticsearch:

>>>from elasticsearch import Elasticsearch
>>> es = Elasticsearch()
>>>es.search(index='test_index', doc_type='test_item', body={"query":{"match":{"some_text":"*"}}})

No hits

>>>es.search(index='test_index', doc_type='test_item', body={"query":{"match":{"some_text":"asterisk"}}})

One hit('asterisk')

>>>es.search(index='test_index', doc_type='test_item', body={"query":{"match":{"some_text":"\*"}}})

No hits



Using pyelasticsearch
>>>es.search('some_text:*', index='test_index')
2 hits, '1+*' and 'asterisk'
>>>es.search('some_text:\*', index='test_index')
No hits

How can I get the first item to show up in a search? Despite the inconsistencies between the various search methods, all of them seem to agree that I'm not allowed to get '*' back, but why? Also, escaping * seems to make the problem worse, which is kind of unusual. (I assume there is some autoescaping in the libraries perhaps, but that doesn't really explain the direct ES query).

Edit: I should mention that it is definitely indexed.

>>>es.get('test_index', 'test_item', 1)

{'_index': 'test_index', '_version': 1, '_id': '1', 'found': True, '_type': 'test_item', '_source': {'some_text': '*'}}

It may be possible that it's stored, though, which is a special thing for elasticsearch as far as I know?

Edit2: ElasticSearch docs that talk about escaping some

1

There are 1 best solutions below

0
On

Ended up solving this by changing the analyzer to a whitespace analyzer. (It was a lucene issue, not elasticsearch, which was why it was tough to find!)