Replace all unmatched surrogate pairs with replacement character in JavaScript string

439 Views Asked by At

I have a JavaScript string that I'm writing to a file. I need to replace any unmatched surrogate pairs with the replacement character. Is there some regex character class that only matches unpaired surrogates or do I have to do some additional processing?

2

There are 2 best solutions below

0
On BEST ANSWER

String.prototype.toWellFormed() replaces any lone surrogates with the Unicode replacement character U+FFFD .

0
On
function toWellFormed(s) {
  return s.replace(/\p{Surrogate}/gu, '\uFFFD')
}
toWellFormed('foo ')                  // 'foo '
toWellFormed('foo \uD834\uDF06')       // 'foo '
toWellFormed('foo \uD834')             // 'foo �'
toWellFormed('foo \uDF06\uDF06\uDF06') // 'foo ���'