I am working with a Objective-C Application, specifically I am gathering the dictionary representation of NSUserDefaults with this code:
NSUserDefaults *defaults = [NSUserDefaults standardUserDefaults];
NSDictionary *userDefaultsDict = [defaults dictionaryRepresentation];
While enumerating keys and objects of the resulting dict, sometimes I find a kind of opaque string that you can see in the following picture:
So it seems like an encoding problem.
If I try to print description of the string, the debugger correctly prints:
Printing description of obj:
tsuqsx
However, if I try to write obj to a file, or use it in any other way, I get an unreadable output like this:
What I would like to achieve is the following:
Detect in some way that the string has the encoding problem.
Convert the string to UTF8 encoding to use it in the rest of the program.
Any help is greatly appreciated. Thanks
EDIT: Very Hacky possible Solution that helps explaining what I am trying to do.
After trying all possible solutions based on dataUsingEncoding and back, I ended up with the following solution, absolutely weird, but I post it here, in the hope that it can help somebody to guess the encoding and what to do with unprintable characters:
- (BOOL)isProblematicString:(NSString *)candidateString {
BOOL returnValue = YES;
if ([candidateString length] <= 2) {
return NO;
}
const char *temp = [candidateString UTF8String];
long length = temp[0];
char *dest = malloc(length + 1);
long ctr = 1;
long usefulCounter = 0;
for (ctr = 1;ctr <= length;ctr++) {
if ((ctr - 1) % 3 == 0) {
memcpy(&dest[ctr - usefulCounter - 1],&temp[ctr],1);
} else {
if (ctr != 1 && ctr < [candidateString length]) {
if (temp[ctr] < 0x10 || temp[ctr] > 0x1F) {
returnValue = NO;
}
}
usefulCounter += 1;
}
}
memset(&dest[length],0,1);
free(dest);
return returnValue;
}
- (NSString *)utf8StringFromUnknownEncodedString:(NSString*)originalUnknownString {
const char *temp = [originalUnknownString UTF8String];
long length = temp[0];
char *dest = malloc(length + 1);
long ctr = 1;
long usefulCounter = 0;
for (ctr = 1;ctr <= length;ctr++) {
if ((ctr - 1) % 3 == 0) {
memcpy(&dest[ctr - usefulCounter - 1],&temp[ctr],1);
} else {
usefulCounter += 1;
}
}
memset(&dest[length],0,1);
NSString *returnValue = [[NSString alloc] initWithUTF8String:dest];
free(dest);
return returnValue;
}
This returns me a string that I can use to build a full UTF8 string. I am looking for a clean solution. Any help is greatly appreciated. Thanks


We're talking about a string which comes from the
/Library/Preferences/.GlobalPreferences.plist(keycom.apple.preferences.timezone.new.selected_city).This string has no encoding problem and characters like
\x1a,\x1c, ... are valid characters. You can calldataUsingEncoding:with ASCII, UTF-8, ... but all these characters will still be present. They're called control characters (or non-printing characters). The linked Wikipedia page explains what these characters are and how they're defined in ASCII, extended ASCII and unicode.What you're looking for is a way how to remove control characters from a string.
Remove control characters
We can create a category for our new method:
In all examples below, the
cityvariable is created in this way ...... and contains
@"\x1at\x1c\x1ds\x1d\x1cu\x16\x1fq\x1d\x1bs\x17\x1ax\x10". Also all examples below were tested with the following code:Split & join
One way is to utilize the
NSCharacterSet.controlCharacterSet. There's astringByTrimmingCharactersInSet:method (NSString), but it removes these characters from the beginning/end only, which is not what you're looking for. There's a trick you can use:It splits the string by control characters and then joins these components back. Not a very efficient way, but it works.
ICU transform
Another way is to use ICU transform (see ICU User Guide). There's a
stringByApplyingTransform:reverse:method (NSString), but it only accepts predefined constants. Documentation says:Let's update our implementation:
[:Cc:]represents control characters,[:Cf:]represents format characters. Both represents the same character set as the already mentionedNSCharacterSet.controlCharacterSet. Documentation:Iterate over characters
NSCharacterSetalso offers thecharacterIsMember:method. Here we need to iterate over characters (unichar) and check if it's a control character or not.Let's update our implementation:
Here we filter out all characters (
unichar) which belong to thecontrolCharacterSet.Other ways
There're other ways how to iterate over characters - for example - Most efficient way to iterate over all the chars in an NSString.
BBEdit & others
Let's write this string to a file:
It's up to the editor how all these controls characters are handled/displayed. Here's en example - Visual Studio Code.
View - Render Control Characters off:
View - Render Control Characters on:
BBEdit displays question marks (upside down), but I'm sure there's a way how to toggle control characters rendering. Don't have BBEdit installed to verify it.