Strlen function giving wrong length when there are non-english characters in string

145 Views Asked by At

I have a program that accepts non-english characters also as an input field. Because we use strlen, it has failed to give expected length while calculating the length of the string when there is a non-english character. For input nova, output is 4 whereas for input ñova, the output is 5 whereas the output should be 4.

  1. strlen("nova") = 4
  2. strlen("ñova") = 5

In the 2nd case, I would expect the output as 4 instead.

1

There are 1 best solutions below

1
On

Remember that strlen returns the count of char in the string, which is not necessarily the same as the number of visible glyphs when it's printed.

The result will depend on your system's character coding - with ISO-8859.1, "ñova" is the same as { 241, 111, 118, 97, 0} (length 4), but if you use UTF-8, for example, then ñ is a multi-byte character and the string is represented as {195, 177, 111, 118, 97, 0} (length 5).

If you want to count the number of codepoints, then you probably want to be using mbrlen() instead of strlen(). If you want to count the number of "user" characters, taking account of combining accents and the like, then you really need a character-handling library such as ICU.