So, I should prefix this by saying that I understand * is a special character that should be escaped for elasticsearch queries. Here's the setup and trouble I'm facing. The basic problem boils down to that I'm unable to search fields containing only '*'.
curl -XPUT 'http://localhost:9200/test_index/test_item/1' -d '{
"some_text" : "*"
}'
curl -XPUT 'http://localhost:9200/test_index/test_item/2' -d '{
"some_text" : "1+*"
}'
curl -XPUT 'http://localhost:9200/test_index/test_item/3' -d '{
"some_text" : "asterisk"
}'
curl -XGET 'http://localhost:9200/test_index/_search?q=some_text:*'
Results:
"hits":{"total":2,"max_score":1.0,"hits":[
"_source":{"some_text" : "1+*"},
"_source":{"some_text" : "asterisk"}
]
curl -XGET 'http://localhost:9200/test_index/_search?q=some_text:\*'
Results:
"hits":{"total":0,"max_score":null,"hits":[]}
Using python elasticsearch:
>>>from elasticsearch import Elasticsearch
>>> es = Elasticsearch()
>>>es.search(index='test_index', doc_type='test_item', body={"query":{"match":{"some_text":"*"}}})
No hits
>>>es.search(index='test_index', doc_type='test_item', body={"query":{"match":{"some_text":"asterisk"}}})
One hit('asterisk')
>>>es.search(index='test_index', doc_type='test_item', body={"query":{"match":{"some_text":"\*"}}})
No hits
Using pyelasticsearch
>>>es.search('some_text:*', index='test_index')
2 hits, '1+*' and 'asterisk'
>>>es.search('some_text:\*', index='test_index')
No hits
How can I get the first item to show up in a search? Despite the inconsistencies between the various search methods, all of them seem to agree that I'm not allowed to get '*' back, but why? Also, escaping * seems to make the problem worse, which is kind of unusual. (I assume there is some autoescaping in the libraries perhaps, but that doesn't really explain the direct ES query).
Edit: I should mention that it is definitely indexed.
>>>es.get('test_index', 'test_item', 1)
{'_index': 'test_index', '_version': 1, '_id': '1', 'found': True, '_type': 'test_item', '_source': {'some_text': '*'}}
It may be possible that it's stored, though, which is a special thing for elasticsearch as far as I know?
Ended up solving this by changing the analyzer to a whitespace analyzer. (It was a lucene issue, not elasticsearch, which was why it was tough to find!)