I'm study C from Kernighan and Ritchie (1988), which use ASCII in character manipulation. In chapter 2, they start using the header file ctype.h. Searching in internet and reading the comments in ctype.h file, they wrote that is for ascii; so it makes sense that for other encoding of characters as utf-8 don't work so well.
I was printing the value of iscnctr() for the values between 0-32 and 127-159 (decimal); and I was expecting that it would return 0 or 1, but instead, it return 0 and 32.
Why it doesn't return 0 or 1? And there is ctype.h for utf-8?
The
is*functions fromctype.hreturn zero if the character does not meet the condition and non-zero otherwise. Any non-zero.32is non-zero.Looking at cppreference isctrl it should return non-zero for 0-31 and 127 for ASCII.
The short answer is no. UTF-8 is a multibyte encoding. Function from
ctype.hare for single byte narrow characters.The standard way, is when you have a string containing a multibyte character (in the C programming sense) first convert it to wide characters by first setting the appropriate locale for your environment and then call the
mbtowc. Then you can useisw*function fromwctype.hto identify the character category.Because of a tiny amount of speed, that was relevant 50 years ago. Changing bitwise result to 0/1 is an additional operation. Nowadays, it doesn't make a difference, which is why modern programming languages have bool. 50 years ago, there was no bool in C and single operations were much more important.
It is usually implemented like the following. There is a big table that maps of every character to a single byte with flags.
Then all
isctrlis checks bitwise if a bit inside a map is set.Because the result of
&is equal to_ISCTRL_FLAGwhen the bit is set, the result is 32 or 0.