I have an issue with UTF7 decoding. I was able to isolate the problem, creating the following sample code:
NSStringEncoding stringEncoding = myFunctionForTranslateCodepageToEncoding(codePage);
// see the end of the string, it's important
const char * testBuffer ="aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa+ADw-";
NSString * testString = [[NSString alloc] initWithBytes:testBuffer length:strlen(testBuffer) encoding:stringEncoding];
Where:
strlen(testBuffer) is 508,
'codePage' is 65000,
'stringEncoding' is 2214592768 (probably UTF-7, as expected, but I can't find clear confirmation…).
'+ADw-' is UTF7 sequence for '<'.
In this example the testString is always nil, so the conversion fails. But here are the strange things:
- When I remove just one 'a' from the testBuffer, the conversion works, the testString is created properly. When I add one or more 'a', it doesn't work.
- When I 'damage' the utf7 encoded symbol at the end (the only one in this example, '+ADw-'), it works fine. I can change it to '.ADw-' or '+ADw.' and the buffer is converted properly. Of course, the 'damaged' symbol is not decoded, it's just written literally but the conversion works. It produces "…aaaaa.ADw-" in NSString. I can also cut the buffer by 1, so I'll have "…aaaaa+ADw" and it will also be converted properly (as the UTF7 symbol is incomplete).
- When I add any ASCII character at the end of the buffer, after the UTF7 symbol, it works. So I.e. "…aaaaa+Adw-a" is converted into NSString "…aaa>a".
- When the buffer contains more UTF7 symbols, the length when it starts failing changes. So it's not always 508 or more characters.
- I can use any other UTF7 symbol at the end. It doesn't matter.
I've also tried to replace initWithBytes: method with initWithCString. I didn't check all the possible cases, but in all tested ones it behaves the same like initWithBytes:. I've performed my tests on iOS 6.0.
Do you have any ideas how to properly deal with UTF7 encoded strings?