Unsigned char* instead of char*

108 Views Asked by At

Why most c functions with string in param usually has definition char* and not unsigned char*? According to my understanding unsigned char* is more generic. And how to deal in situations when I have unsigned char* and need to pass to char* in correct way?

For example is it good style to call function in sample below?

unsigned char* a=...
strlen((char*)a);
2

There are 2 best solutions below

1
Elzaidir On

The standard states that :

An object declared as type char is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative. If any other character is stored in a char object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type.

char is guaranteed to be able to represent all characters as a positive number.

Moreover :

The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.

It is implementation defined if char is signed or not.

The type char is, thus, the type defined to be the best fit to store characters for the implementation. There are no need to cast it to an unsigned char.

Also, you say that "According to my understanding unsigned char* is more generic.". You may be referring to the fact that the sign of char is implementation defined, and using unsigned makes it more predictable across implementations. While this would be true when using it for arithmetic, the standard ensures that when using it for string and characters, the value is never negative, making it predictable across all implementations.

0
John Bollinger On

Why most c functions with string in param usually has definition char* and not unsigned char*?

Because char is the (original) data type intended for character data. Denotionally, unsigned char and signed char are for small numbers. Strings and string functions were conceived as serving needs related to character data, so they are defined in terms of type char. It's the semantically consistent thing to do.

According to my understanding unsigned char* is more generic.

I don't know how you got that idea. char and unsigned char (and signed char) are distinct types. C does not have any kind of subtyping, but char could be considered more generic than unsigned char in that as far as the spec is concerned, char defines only the size of objects, not their signedness. Or in that it somehow caters to the platform's native treatment and interpretation of character data, regardless of what exactly that may be. On the other hand, I don't see any case for considering unsigned char more generic than char.

C does have a generic pointer type, in the sense that there are implicit conversions in both directions between that type and all other object-pointer types. That type is spelled void *, but ancient, pre-void C code generally used char * where modern code would use void *.

And how to deal in situations when I have unsigned char* and need to pass to char* in correct way?

If you are talking specifically about string functions, then you should define your own strings as sequences of char, not of unsigned char, and thereby avoid the issue.

But if you have null-terminated byte sequences on which you want to use the string functions, even though the sequences are not inherently character in nature, then it should pretty much just work. You will need to cast the pointers, like you show in your example, but whether type char is signed or unsigned, it is valid to access unsigned char data as if it were char data (and vice versa), and these types do not have any trap representations that could interfere.

Do take care, however, for arbitrary binary data generally are not null terminated and cannot easily be converted to null-terminated form. Appending a null byte usually cannot be relied upon to help, because it is rarely safe to assume that there aren't any internal null bytes. Thus, the string functions are usually inappropriate for such data.

Or if you have functions that use char * as a generic pointer type, and you are unwilling or unable to convert them to use void * instead, and whatever your unsigned char * points to is suitable for those functions to operate on, then just do the pointer conversion on your input pointer, and get on with it. You're already assuming that the pointed to thing is acceptable.