Regular expression to include all languages characters in the first name in js

48 Views Asked by At

I want to add functionality for user to add First name or Last name in any language for Eg. Spanish, French and Chinese. Right now I am using this RegEx:

var firstNameRegex  = /^[a-zA-Z\u3000\u3400-\u4DBF\u4E00-\u9FFF\u00C0-\u017F\-\'\s]{1,45}$/;

But this is only working for English Alphabets When I am trying to add French characters like

à- â - ä- é - è - ê - ë - ï - î - ô -ö - ù - û - ü- ÿ-ç 

Then i am getting error.

Any suggestions.

2

There are 2 best solutions below

0
jakub podhaisky On BEST ANSWER

Use this regex:

 var firstNameRegex = /^[a-zA-Z\u00C0-\u017F\u3000\u3400-\u4DBF\u4E00-\u9FFF\-\'\s]{1,45}$/;

Or this if only whitespaces are not valid:

var firstNameRegex = /^(?=.*\S)[a-zA-Z\u00C0-\u017F\u3000\u3400-\u4DBF\u4E00-\u9FFF\-\'\s]{1,45}$/

please see this regex cheatsheet

document.getElementById('nameForm').onsubmit = function(event) {
    event.preventDefault();

    var firstNameRegex = /^[a-zA-Z\u00C0-\u017F\u3000\u3400-\u4DBF\u4E00-\u9FFF\-\'\s]{1,45}$/;
    var input = document.getElementById('firstNameInput').value;

    if (firstNameRegex.test(input)) {
        alert("'" + input + "' is a valid name.");
    } else {
        alert("'" + input + "' is NOT a valid name.");
    }
};
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Regex Name Validation</title>
</head>
<body>

<h2>Enter a First Name</h2>

<form id="nameForm">
  <input type="text" id="firstNameInput" placeholder="First Name">
  <button type="submit">Check Validity</button>
</form>

2
Christopher On

By using the regex unicode groups, you can test for all common used letters.

\p{L} or \p{Letter} contains the matching-information for all defined unicode symbols that represent a letter.

For a list of the matching unicode symbols, you can take a look at https://www.compart.com/en/unicode/category. Everything in the categories starting with L, matches \p{L}. These are Lowercase Letter, Modifier Letter, Other Letter, Titlecase Letter, Uppercase Letter.

By using the modifier u for unicode, these groups (\p{GroupName}) will be parsed and you get the expected result.

let list = [
  "Sören",
  "Jürgen",
  "Téa", 
  "Váquéz", 
  "Garciá", 
  "Zöe",
  "Eleña",
  "Double-Name",
  "强", // examples from https://en.wikipedia.org/wiki/Chinese_given_name
  "秀英",
  "Number 5",
  "Emoji ",
  "Some ",
];

list.forEach(firstName => console.log(
  firstName,
  /^[\p{L}\-'\x20]{1,45}$/u.test(firstName)
));

I would not recommend using the meta-character \s in this use case, as it matches multiple white-space characters: space character (\x20), tab character (\x09), carriage return character (\x0D), new line character (\x0A), vertical tab character (\x0B), form feed character (\x0C)

More information about the unicode groups can also be found on: