Get the width of Chinese strings correctly

1.5k Views Asked by At

I want to make a border around the text 这是一个测试, but I cannot get the actual width of it. With English text, it does work perfectly.

Screenshot

Here is my analysis:

len tells me this:

这是一个测试 18
aaaaaaaaa 10
つのだ☆HIRO 16
aaaaaaaaaa 10

runewidth.StringWidth tells me this:

这是一个测试 12
aaaaaaaaa 10
つのだ☆HIRO 11
aaaaaaaaaa 10
func main() {
    fmt.Println("这是一个测试 |")
    fmt.Println("aaaaaaaaaa | 10*a")
    fmt.Println()
    fmt.Println("这是一个测试 |")
    fmt.Println("aaaaaaaaa | 9*a")
    fmt.Println()
    fmt.Println("Both are not equal to the Chinese text.")
    fmt.Println("The (pipe) lines are not under each other.")
}

enter image description here

Question:

How can I get my box (first screenshot) to appear correctly?

3

There are 3 best solutions below

0
On BEST ANSWER

Unicode characters (like Chinese characters) in Golang take 3 bytes, while ASCII only takes 1 byte. That's by design.

If you wish to check the actual string size of unicode character, use unicode/utf8 built-in package.

fmt.Printf("String: %s\nLength: %d\nRune Length: %d\n", c, len(c), utf8.RuneCountInString(c))
// String: 这是一个测试
// Length: 18
// Rune Length: 6

More basic way to count is by using for loop.

count := 0
for range "这是一个测试" {
    count++
}
fmt.Printf("Count=%d\n", count)
// Count=6

About the pretty print of Chinese and English strings in tabular format, there seems to be no direct way. Nor the tabwriter works in this case. A small hack-around this is to use csv writer as follows:

data := [][]string{
    {"这是一个测试", "|"},
    {"aaaaaaaaaa", "|"},
    {"つのだ☆HIRO", "|"},
    {"aaaaaaaaaa", "|"},
}

w := csv.NewWriter(os.Stdout)
defer w.Flush()
w.Comma = '\t'

for _, row := range data {
    w.Write(row)
}

This should print data as expected. Unfortunately, StackOverflow isn't printing the same format as I see in terminal. But Playground to our rescue. Click Here

Note: This works for strings with rune size close enough to one another. For lengthier strings, you'd need more work-around.

0
On

Your problem is (as mkopriva points out in comments) a display issue, not amenable to being resolved by any sort of counting trick.

We have the same problem when we display variable-pitch, or proportional, text, vs monospace text, in English. That is, compare:

mmmm, tasty
iiii, tasty?

with:

    mmmm, tasty
    iiii, tasty?

(assuming you use a browser to read this answer!). We don't have to print Chinese characters, or even leave simple ASCII to have the problem!

What you need is a monospaced display font for your Chinese text, or perhaps some software to typeset it in tabular form, and how you get that is ... another question entirely.

0
On

i think this is what you want

func TestChinese(t *testing.T) {
    tests := []string{
        "这是一个测试",
        "aaaaaaaaa",
        "つのだ☆HIRO",
        "aaaaaaaaaa",
        "这是aaaaa一个测试",
        "这是一个つの测试",
    }
    for _, tt := range tests {
        fmt.Printf("%s\t%d\t%d\n", tt, len([]rune(tt)), len([]byte(tt)))
    }
}

output:

这是一个测试  6   18
aaaaaaaaa   9   9
つのだ☆HIRO    8   16
aaaaaaaaaa  10  10
这是aaaaa一个测试 11  23
这是一个つの测试    8   24