I have a table containing some regexes. By default the table was created using utf8mb4_general_ci collation. Everything is fine until i try to add a regex containing homoglyphes like this one. The regex // once stored in my database will simply never match.
What is causing this behavior ? I suspect the mysql table row collation to be in fault. If so, what collation should be used to fix this ?
Does changing the collation will affect my other regex already stored in that table (more than 3k, but plain text, no homoglyphes) ?
It is possible to create a COLLATION that treats homoglyphs as equal, but I don't believe such exists in MySQL.
I tried these with the default collation:
and got
MySQL 8.0 updated its regexp processor to handle utf8mb4, so now
So, with a bunch of 'character classes', you can build a regexp to check for any set of B-loolalikes, hence CRYPTO lookalikes.