Elasticsearch query in java for regex not working correctly- is the syntax wrong?

120 Views Asked by At

Have not had luck finding answers but here is my situation:

QueryBuilders.regexpQuery("regexExpression","\\-?*[0-9]3475[0-9]6");

QueryBuilders.simpleQueryStringQuery("\\-?*2347566);

Both queries I ran and simplequery returns 2347566 with or without the "-" but the regex query doesn't return anything. The value I printed out for regex query is \-?*[0-9]3475[0-9]6 which is correct.
Does anyone know why this is happening? I set the flag to "all" as well.

To clarify, I have a database of phone numbers that could have +,# as prefix and the user can input those characters, * and numbers in the search field. The search should ignore all instances of +,# in the search so that if I have +123 in the database, 123 search will return that result or +123 should also return same result since I am treating the + as wildcard so 0 or 1 occurence.

For example, a +12345 should return 123[0-9]45[0-9] with or without the + (or the other occurrence.)

I have tried searching using regex with purely * like 234**789 when I know there are many phone numbers that can be found if I use exact match like 23456789 or 23477789. But when I use the regex search only on * or [0-9], it returns nothing even though the value of the regex search is 234[0-9][0-9]789.

2

There are 2 best solutions below

0
user3822558 On

In the end I was using the wrong field for the query which i thought was there. Had to search across all fields to get some results.

0
Reilas On

Wikipedia has a great documentation on the regular expression syntax.
Wikipedia – Regular expression – Syntax.

There are a few errors with your pattern.

Firstly, you'll only have to escape a hyphen character if it is within a character class, and is subsequently not the first, or only character, or appears after the final range.

So, it's attempting to match a value that begins with a \ character.

"... I have tried searching using regex with purely * like 234**789 ..."

The * character, in regular expression syntax, is a quantifier, as opposed to most Unix syntax that recognizes that as a wildcard character.

The meta-character you're looking for is the dot character, ..

Try the following pattern, and see if the results improve.

-?.[0-9]3475[0-9]6

Additionally, you can, optionally, use the \d syntax to represent a [0-9].

-?.\d3475\d6

As a final note, you mention

"... phone numbers that could have +,# as prefix ..."

Would that not mean the pattern should be

[+#]?\d3475\d6