I have a program that accepts non-english characters also as an input field. Because we use strlen
, it has failed to give expected length while calculating the length of the string when there is a non-english character. For input nova
, output is 4
whereas for input ñova
, the output is 5
whereas the output should be 4
.
strlen("nova")
=4
strlen("ñova")
=5
In the 2nd case, I would expect the output as 4
instead.
Remember that
strlen
returns the count ofchar
in the string, which is not necessarily the same as the number of visible glyphs when it's printed.The result will depend on your system's character coding - with ISO-8859.1,
"ñova"
is the same as{ 241, 111, 118, 97, 0}
(length 4), but if you use UTF-8, for example, thenñ
is a multi-byte character and the string is represented as{195, 177, 111, 118, 97, 0}
(length 5).If you want to count the number of codepoints, then you probably want to be using
mbrlen()
instead ofstrlen()
. If you want to count the number of "user" characters, taking account of combining accents and the like, then you really need a character-handling library such as ICU.