Javascript Unicode Redex matching NOT a letter or a number

139 Views Asked by At

I would like to convert this:

var result = mystring.replace(/[^a-zA-Z0-9]+/g, ' ');

to a functioning unicode version so that I can index ONLY letters and numbers. I don't want [-_%<>...] for example. Since JS does not support this natively, I am using xregexp.

This does not seem to give me any results... Do I have the letter and number part correct here?

<script src="https://unpkg.com/xregexp/xregexp-all.js"></script>
<script>
    var s = `joanthan------______++++++ <me> bornss $%^&\` asdfasdf+++áeé´sé´s , н, п, р, с, т, ф, х, ц, ч`;
    var r1 = XRegExp.replace(s, /[^\p{L}\p{N}]+/g, ' ');
    var r2 = s.replace(/[^a-zA-Z0-9]+/g, ' ');
    console.log(r1);
    console.log(r2);
</script>

Thoughts? Thanks!

2

There are 2 best solutions below

0
On BEST ANSWER

According to their documentation replace supports two match parameters; string and Regexp. That being said it will not parse a string expression, and so would be treated as a literal string replacement. To Use a xregex you would first have to create a expression instance and then use that as an argument.

var s = `joanthan------______++++++ <me> bornss $%^&\` asdfasdf+++áeé´sé´s , н, п, р, с, т, ф, х, ц, ч`;
var match = XRegExp('[^\\p{L}\\p{N}]+', 'g');
var r1 = XRegExp.replace(s, match, ' ');
var r2 = s.replace(/[^a-zA-Z0-9]+/g, ' ');

console.log(r1);
console.log(r2);
<script src="https://unpkg.com/xregexp/xregexp-all.js"></script>

1
On

In order to use Unicode property escapes with RegExp (and by extension, XRegExp), you need to set the Unicode flag.

const s = `joanthan------______++++++ <me> bornss $%^&\` asdfasdf+++áeé´sé´s , н, п, р, с, т, ф, х, ц, ч`;
let r1 = s.replace(/[^\p{L}\p{N}]+/gu, ' ')
console.log(r1);