This is a part of my Instagram account backup
[
{
"media": [
{
"title": "\u00d0\u0094\u00d0\u00be\u00d1\u0080\u00d0\u00be\u00d0\u00b3\u00d0\u00be\u00d0\u00b9 \u00d0\u00b4\u00d1\u0080\u00d1\u0083\u00d0\u00b3"
}
]
}
]
To parse this I use Codable
struct BlogPost: Codable {
let media: [Media]
}
struct Media: Codable {
let title: String
}
But this code prints ÐоÑогой дÑÑг
let bundle = Bundle.main
let path = bundle.path(forResource: "posts_1", ofType: "json")
let content = try? String(contentsOfFile: path!)
let data = content!.data(using: .utf8)!
let result = try? JSONDecoder().decode([BlogPost].self, from: data)
print(result![0].media[0].title)
And it should print Дорогой друг. How to decode this string on iOS? I am also using mothereff.in to decode backup data.
Let's start by summarizing some details. Instagram is encoding the string
"Дорогой друг"as"\u00d0\u0094\u00d0\u00be\u00d1\u0080\u00d0\u00be\u00d0\u00b3\u00d0\u00be\u00d0\u00b9 \u00d0\u00b4\u00d1\u0080\u00d1\u0083\u00d0\u00b3"Let's look at what this means. The
Дis the Unicode character U+0414. It has a UTF-8 encoding ofD0 94. Note that the encoded title in the JSON begins with\u00d0\u0094. Then theоis the Unicode character U+043E with a UTF-8 encoding ofD0 BE. And sure enough, the encoded title in the JSON has\u00d0\u00beas the next set of values. So it seems that Instagram is encoding the string as UTF-8 while using the\uxxxxescape characters. At least for the Cyrillic characters. The space is encoded as a regular space character.The problem is that
JSONDecoderexpects that if a string contains escaped characters in the form\uxxxx, it assumes the code is the Unicode value, not part of the UTF-8 encoding. When it parses the title, it first sees\u00d0. That's the Unicode characterÐ. Then it sees\u0094. That's the Unicode character "CANCEL CHARACTER", a non-printable character. This continues and you end up with"ÐоÑогой дÑÑг".JSONDecoderhas no built-in functionality to tell it how to handle Instagram's non-standard encoding of strings. So this means the only solution is to write a custom decoder.Here is a working solution. Update your
Mediastruct as follows:This is fine if there's only the one value to handle. If you need to deal with this for more than one property, move the logic to a
Stringextension:Then the updated
Mediacode becomes:Here's a complete example that can be run in a Playground:
Output:
Note that this solution works with the provided example. It's possible that Instagram encodes some characters in such a way that this solution could fail in some cases. Without more data I can't know for sure. Post a comment with relevant details if you come across an example that this code doesn't handle correctly.