I am new with ElasticSeach, and currently working with OpenSearch from AWS OpenSearch service. In Dev Tools, I have the following query:
GET _search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must_not": [
{
"regexp": {
"handler_id": "[^_*\\-.;?#$%^@!`,/?+()~<>:'\\[\\]{}]*`"
}
}
],
"must": [
{
"regexp": {
"handler_id": "~([^.])*[A-Za-z]{2}[a-zA-Z0-9]{2}[0-9]{8}"
}
}
]
}
},
"sort": [
{
"handler_id.keyword": {
"order": "asc"
}
}
]
}
The above query supposed to get all handler_id without special characters on it, and then also meet the must
format. It works, but it always return this handler_id = .MP4137879580
. I also tried regex ^[A-Za-z]{2}[a-zA-Z0-9]{2}[0-9]{8}(?![^.]+$)
, then "~([^
.])*[A-Za-z]{2}[a-zA-Z0-9]{2}[0-9]{8}"
to escape dot, but the id still showed up.
Please give me some pointer on how to troubleshoot this problem. Thank you!
TLDR:
This was tested that on elasticsearch. Sorry, I am not using opensearch and have not plans to start, but it is trivial enough so it should work.
There are several problems in your query.
The first one is that by default, elasticsearch indexes each record twice - one time in an analyzed form and another time in non-analyzed form. The analyzed form is stored in
handler_id
and for your test string it is converted intomp4137879580
(lowercase split by spaces with punctuation removed). In thehandler_id.keyword
your original string is indexed as is. So, when you usehandler_id
in regexp you are search these converted strings instead of original strings. So, the first fix is to usehandler_id.keyword
in your query.The second issue is that
regexp
contains an extra back tick at the end, which doesn't match. Just remove it.The third issue is that you are using double negative here. First you find all handler_ids that don't contain punctuation, and then you wrapping it into
must_not
essentially saying "I don't want any of these". So you need to with either move your regex intomust
or change your regex to match handlers with punctuation and keep it inmust_not
. I picked the first solution in my example.