I wanted to this:
for i := 0; i < len(str); i++ {
dosomethingwithrune(str[i]) // takes a rune
}
But it turns out that str[i]
has type byte
(uint8
) rather than rune
.
How can I iterate over the string by runes rather than bytes?
To mirror an example given at golang.org, Go allows you to easily convert a string to a slice of runes and then iterate over that, just like you wanted to originally:
runes := []rune("Hello, 世界")
for i := 0; i < len(runes) ; i++ {
fmt.Printf("Rune %v is '%c'\n", i, runes[i])
}
Of course, we could also use a range operator like in the other examples here, but this more closely follows your original syntax. In any case, this will output:
Rune 0 is 'H'
Rune 1 is 'e'
Rune 2 is 'l'
Rune 3 is 'l'
Rune 4 is 'o'
Rune 5 is ','
Rune 6 is ' '
Rune 7 is '世'
Rune 8 is '界'
Note that since the rune
type is an alias for int32
, we must use %c
instead of the usual %v
in the Printf
statement, or we will see the integer representation of the Unicode code point (see A Tour of Go).
For example:
package main
import "fmt"
func main() {
for i, rune := range "Hello, 世界" {
fmt.Printf("%d: %c\n", i, rune)
}
}
Output:
0: H
1: e
2: l
3: l
4: o
5: ,
6:
7: 世
10: 界
You can check the doc.
rune
basically is an alias for int32
type:
type rune = int32
Literal strings in Go are being encoded in UTF-8 format, which allows to store Unicode codes corresponding to the characters from Unicode table:
0xxxxxxx unicode codes 0−127
110xxxxx 10xxxxxx 128−2047
1110xxxx 10xxxxxx 10xxxxxx 2048−65535
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 65536−0x10ffff
as you can see such encoding takes 1-4 bytes per character, and that is why we have rune = int32
(4 bytes) here to accommodate the worst-case scenario when we need 4 bytes to encode Unicode character code.
From Unicode table you can see that if your string
has only alphanumeric (ASCII characters) then the number of runes in your string
would be equal to the number of bytes, as such ASCII characters take just 1 byte to be encoded. But it is not true when you use non-ASCII characters:
import "unicode/utf8"
func countRunes() {
s := "Hello, 世界"
fmt.Println(len(s)) // "13" - bytes
fmt.Println(utf8.RuneCountInString(s)) // "9" - runes characters
}
for i, r := range "Hello, 世界" {
fmt.Printf("%d\t%q\t%d\n", i, r, r)
}
string
:runes := []rune("Hello, 世界")
for i := 0; i < len(runes) ; i++ {
fmt.Printf("Rune : '%c'\n", runes[i])
}
utf8
:for i := 0; i < len(s); {
r, size := utf8.DecodeRuneInString(s[i:])
fmt.Printf("%d\t%c\n", i, r)
i += size
}
Alternatively, a code example that doesn't uses fmt
package:
package main
func main() {
for _, rune := range "Hello, 世界" {
println(string(rune))
}
}
In the loop, the variable r
represents the current rune
being iterated over. We convert it to a string using the string()
function before printing it to the console.
See this example from Effective Go :
This prints :