Correct way to remove last grapheme of CharSequence

95 Views Asked by At

The code:

val plainText = "plainText"
val plainTextWithEmoji = "plainText"

println("plainText=$plainText, length=${plainText.length}")
println("plainTextWithEmoji=$plainText, length=${plainTextWithEmoji.length}")

// Output:
// plainText=plainText, length=9
// plainTextWithEmoji=plainText, length=15

This code imply that emoji character's length is 2, not 1.

When I want to remove the last character's:

If I call plainText.subSequence(0, plainTextWithEmoji.length - 1), the result is wrong, because emoji character length is more than 1.

To call subSequence and get the correct result, do this: plainText.subSequence(0, plainTextWithEmoji.length - 2)

But in general, We can not know if the last character's length is 1. When we want to remove the last character, simply call charSequence.subSequence(0, charSequence.length - 1) will return a wrong result.

So, it is any way to remove last grapheme of CharSequence? Thx!

1

There are 1 best solutions below

2
On

Finally, I find the solution inspired by this post. Since UTF-8 is variable length, to call CharSequence.subSequence and get correct result, we can get every grapheme's start index in this sentence by magic BreakIterator:

fun CharSequence.removeLast(): CharSequence {
    val graphemeStartIndexes = computeGraphemesStartIndexes(this)
    return this.subSequence(0, graphemeStartIndexes.last())
}

private fun computeGraphemesStartIndexes(sequence: CharSequence): List<Int> {
    val breakIterator = BreakIterator.getCharacterInstance()
    breakIterator.setText(sequence.toString())
    val graphemesStartIndexes = mutableListOf<Int>()

    val start = breakIterator.first()
    graphemesStartIndexes.add(start)
    while (breakIterator.next() != BreakIterator.DONE) {
        graphemesStartIndexes.add(breakIterator.current())
    }
    return graphemesStartIndexes.apply { removeAt(size - 1) }
}

Example:

val plainTextEmojiSequence = "Hello"
val plainTextOnlySequence = "Hi~!"

println(plainTextEmojiSequence.removeLast()) // "Hello"
println(plainTextOnlySequence.removeLast())  // "Hi~"