REGEXP in MySQL does not understand cyrillic in square brackets. How to fix?

96 Views Asked by At

There is a code, it works, a lot of results are found:

SELECT *
FROM `modx_site_content` AS `msProduct` 
WHERE `msProduct`.`pagetitle` REGEXP '300х250х625'

But I want the number separator to be not only the Russian letter "x", but also the English "x" and the star symbol "*".

First, I change the code like this:

SELECT *
FROM `modx_site_content` AS `msProduct` 
WHERE `msProduct`.`pagetitle` REGEXP '300[х]250[х]625'

I put the Russian "x" in square brackets. I ran a query and found 0 rows. MySQL does not understand Russian "x" in square brackets.

How to fix?

Table in this encoding: utf8_general_ci. Table type: InnoDB.

2

There are 2 best solutions below

7
hjolfaei On

you should use regex symbols:

SELECT *
FROM `modx_site_content` AS `msProduct` 
WHERE `msProduct`.`pagetitle` REGEXP '300[х].?250[х].?625'

if your MySQL version bigger than 8.0 you can use unicode in REGEXP:

\p{}    #Unicode character sets
\P{}    #negated unicode character sets
\u{}    #specify unicode characters using codepoints

Also you can use mysql LIKE operator

MySQL 5.7 Pattern Matching

MySQL 8.0 Pattern Matching

0
Bill Karwin On

Upgrade to MySQL 8.0.

If we test your string matching with MySQL 5.x, it does not work:

https://dbfiddle.uk/PmtEpS11

select '300Х250Х625' regexp '300[Х]250[Х]625' as `match`

+-------+
| match |
+-------+
|     0 |
+-------+

But MySQL 8.0 changed the implementation of regular expressions to use a different library. The new implementation supports multi-byte characters. We test your string matching with MySQL 8.0, it works:

https://dbfiddle.uk/st0gsiJW

select '300Х250Х625' regexp '300[Х]250[Х]625' as `match`

+-------+
| match |
+-------+
|     1 |
+-------+

So in MySQL 8.0, you can test all three characters you are interested in:

select '300x250x625' regexp '300[Хx*]250[Хx*]625' as `match`;

+-------+
| match |
+-------+
|     1 |
+-------+