How to determine the display count of a Swift String?

464 Views Asked by At

I've reviewed questions such as Get the length of a String and Why are emoji characters like 👩‍👩‍👧‍👦 treated so strangely in Swift strings? but neither cover this specific question.

This all started when trying to apply skin tone modifiers to Emoji characters (see Add skin tone modifier to an emoji programmatically). This led to wondering what happens when you apply a skin tone modifier to a regular character such as "A".

Examples:

let tonedThumbsUp = "" + "" // 
let tonedA = "A" + "" // A

I'm trying to detect that second case. The count of both of those strings is 1. And the unicodeScalars.count is 2 for both.

How do I determine if the resulting string appears as a single character when displayed? In other words, how can I determine if the skin tone modifier was applied to make a single character or not?

I've tried a few ways to dump information about the string but none give the desired result.

func dumpString(_ str: String) {
    print("Raw:", str, str.count)
    print("Scalars:", str.unicodeScalars, str.unicodeScalars.count)
    print("UTF16:", str.utf16, str.utf16.count)
    print("UTF8:", str.utf8, str.utf16.count)
    print("Range:", str.startIndex, str.endIndex)
    print("First/Last:", str.first == str.last, str.first, str.last)
}

dumpString("A")
dumpString("\u{1f469}\u{1f3fe}")

Results:

Raw: A 1
Scalars: A 2
UTF16: A 3
UTF8: A 3
First/Last: true Optional("A") Optional("A")
Raw:  1
Scalars:  2
UTF16:  4
UTF8:  4
First/Last: true Optional("") Optional("")
2

There are 2 best solutions below

1
On BEST ANSWER

What happens if you print on a system that doesn't support the Fitzpatrick modifiers? You get followed by whatever the system uses for an unknown character placeholder.

So I think to answer this, you must consult your system's typesetter. For Apple platforms, you can use Core Text to create a CTLine and then count the line's glyph runs. Example:

import Foundation
import CoreText

func test(_ string: String) {
    let richText = NSAttributedString(string: string)
    let line = CTLineCreateWithAttributedString(richText as CFAttributedString)
    let runs = CTLineGetGlyphRuns(line) as! [CTRun]
    print(string, runs.count)
}

test("" + "")
test("A" + "")
test("B\u{0300}\u{0301}\u{0302}" + "")

Output from a macOS playground in Xcode 10.2.1 on macOS 10.14.6 Beta (18G48f):

 1
A 2
B̀́̂ 2
2
On

I think it might be possible to reason about this by looking to see whether the modifier is present and if so whether it has increased the character count.

So for example:

let tonedThumbsUp = "" + ""
let tonedA = "A" + ""
tonedThumbsUp.count // 1
tonedThumbsUp.unicodeScalars.count // 2
tonedA.count //2
tonedThumbsUp.unicodeScalars.count //2
let c = "\u{1F3FB}"
tonedThumbsUp.contains(c) // true
tonedA.contains(c) // true

Okay, so they both contain a modifier character, and they both contain two unicode scalars, but one is count 1 and the other is count 2. Surely that's a useful distinction.