I want to print an NSString byte by byte:
NSString *path = @"/User/user/ǢǨǣ/";
const char *byte_array = [path UTF8String];
for (unsigned char *it = byte_array; *it != 0; ++it) {
NSLog(@"char: %c \t hex: %02x \n", *it, *it);
}
Producing:
Ç - FFFFFFC7
¢ - FFFFFFA2
Ç - FFFFFFC7
¨ - FFFFFFA8
Ç - FFFFFFC7
£ - FFFFFFA3
This should be the output for Ǣ(C7 A2) Ǩ(C7 A8) ǣ(C7 A3). I think those "FFFFFF" form every "byte" affects my code. I'm wondering if is any way of manipulating paths with special characters in them.
The output is behaving as if it was signed
char *being converted to a signed integer and then displayed as a 32-bit hex string. While I am unable to produce the exact behavior you describe (without manually casting it to a signed integer), looking at the headers,UTF8Stringis defined as:And you even defined your pointer to be signed
char *:Note, neither of those are
unsigned char *, but justchar *. (FWIW, that seems exceedingly curious to me at I always think of a “byte” as anunsigned char, i.e., auint8_t.)I personally would use
NSDataanduint8_tto avoid ambiguity:And those last three characters came out as:
As an aside, when I did a cut-and-paste your code snippet, rather than receiving
c7a2forǢ, I receivedc386(Æ) followed bycc84(the “combining macron”, i.e., the “combining” rendition of¯). I do not know whether Stack Overflow or your editor introduced that, but this is a common problem when looking at hexadecimal representations of UTF8 characters, as there are multiple possible representations of the same character. If you are really looking at hex UTF8 representations, you may want to standardize this with, for example,precomposedStringWithCanonicalMapping, as shown above.