Regex won't match words as expected

416 Views Asked by At

I am trying to use XRegExp to test if a string is a valid word according to these criteria:

  • The string begins with one or more Unicode letters, followed by
  • an apostrophe (') followed by one or more Unicode letters, repeated 0 or more times.
  • The string ends immediately after the matched pattern.

That is, it will match these terms

Hello can't Alah'u'u'v'oo O'reilly

but not these

eatin' 'sup 'til

I am trying this pattern,

^(\\p{L})+('(\\p{L})+)*$

but it won't match any words that contain apostrophes. What am I doing wrong?

EDIT: The code using the regex

var separateWords = function(text) {
    var word = XRegExp("(\\p{L})+('(\\p{L})+)*$");
    var splits = [];
    for (var i = 0; i < text.length; i++) {
        var item = text[i];
        while (i + 1 < text.length && word.test(item + text[i + 1])) {
            item += text[i + 1];
            i++;
        }
        splits.push(item);
    }
    return splits;
};
2

There are 2 best solutions below

1
On

Try this regex:

^[^'](?:[\w']*[^'])?$

First it checks to ensure the first character is not an apostrophe. Then it either gets any number of word characters or apostrophes followed by anything other than an apostrophe, or it gets nothing (one-letter word).

0
On

I think you will need to omit the string start/end anchors to match single words:

"(\\p{L})+('(\\p{L})+)*"

Also I'm not sure what those capturing groups are needed for (that may depend on your application), but you could shorten them to

"\\p{L}+('\\p{L}+)*"