Font ligature for text in node

52 Views Asked by At

I am encountering a font ligature issue in a sentence.

The sentence in question is:


Verizon is sunseng BlueJeans as the plaorm struggled to gain tracon against rival services in the video conferencing market


I have a list of ligatures, and some examples are provided here:

const ligatureMap = {
 "": "ti",
 "": "tf",
  ſt: "ft",
 "pla": "platf",
 "AT&T": "AT&T",
}

To address this issue, I am attempting to replace the ligatures using the following code:

return text.replace(/[\uE000-\uF8FF]/g, (match) => {
    return ligatureMap[match] || match;
});

but it is not converting plaorm to tf and & to & So how to solve this?

1

There are 1 best solutions below

0
T.J. Crowder On

There are at least two problems:

  1. Not all of the keys in the ligatureKeys object are only one character long, but your regular expression only searches for single character matches (specifically, for single code unit¹ matches).
  2. At least one of the single-character keys in your object isn't in the range given ( is char code \uFB05).

Separately, there doesn't appear to be an entry in the example ligaturesMap for the character in in your example. I've assumed it should be "tt".

To make sure all of your ligatureMap entries are searched for, including the multi-character ones, you can convert your keys into a regular expression alternation (basically, "this key or that key or this other key"), like this:

const rex = new RegExp(
    Object.keys(ligatureMap).map(escapeRegex).join("|"),
    "g"
);

The escapeRegex function there should be whatever your preferred solution for excaping regular expressions is (perhaps one from this question's answers).

Here's an example using the escapeRegex from this answer (just as an example):

const text =
    "Verizon is sunseng BlueJeans as the plaorm struggled to gain tracon against rival services in the video conferencing market. Testing: AT&T, 3ſt";

const ligatureMap = {
    "": "ti",
    "": "tf",
    "": "tt",
    ſt: "ft",
    "pla": "platf",
    "AT&T": "AT&T",
};
const rex = new RegExp(
    Object.keys(ligatureMap).map(escapeRegex).join("|"),
    "g"
);

const updated = text.replace(rex, (match) => {
    return ligatureMap[match] || match;
});

console.log(text);
console.log(updated);


function escapeRegex(string) {
    return string.replace(/[/\-\\^$*+?.()|[\]{}]/g, "\\$&");
}


¹ For more about "characters" vs. code points vs code units, see my blog post What is a string?