Matching a Unicode "name" with a JavaScript Regular Expression

3.6k Views Asked by At

In JavaScript we can match individual Unicode codepoints or codepoint ranges by using the Unicode escape sequences, e.g.:

"A".match(/\u0041/) // => ["A"]
"B".match(/[\u0041-\u007A]/) // => ["B"]

But how could we create a regular expression to match a proper name which must include any Unicode "letter" using a JavaScript regular expression? Is there a range of letters? A special regex sequence or character class in JavaScript?

Say my website must validate names that could be in latin based languages as well as Hebrew, Cyrillic, Japanese (Katakana, Hiragana, etc.) is this feasible in JavaScript or is the only sane choice to delegate to a backend language with better Unicode support?

2

There are 2 best solutions below

0
On BEST ANSWER

Here's a JS plugin that adds Unicode support to RegEx

http://xregexp.com/plugins/

1
On

I am using for defining unicode of a symbols this site http://www.fileformat.info.

Unicode Blocks (Basic Latin, .+, Cyrillic, .+, Arabic and other): http://www.fileformat.info/info/unicode/block/index.htm

Unicode Character Categories (this does not work in JS): http://www.fileformat.info/info/unicode/category/index.htm

Letters (A-я): http://www.fileformat.info/info/unicode/char/a.htm

Fonts (which chars are supported in each font): http://www.fileformat.info/info/unicode/font/index.htm

Index for all above http://www.fileformat.info/info/unicode/index.htm