This should be an easy one I hope. I have a url:
http://uploads4.wikiart.org/images/marc-chagall/kopeikin-and-napol%C3%A9on.jpg
that is saved into a json file with this code:
paintings = get_all_paintings(marc_chagall)
with open('chagall.json', 'w') as fb:
x = json.dump(paintings, fb)
In the file, the URL has become:
u'http://uploads4.wikiart.org/images/marc-chagall/kopeikin-and-napol\xe9on.jpg'
I am able to get the original, usable, percent-encoded URL with this code:
p = u'http://uploads4.wikiart.org/images/marc-chagall/kopeikin-and-napol\xe9on.jpg'
p = urllib.quote(p.encode('utf8'), safe='/:')
print repr(p)
> 'http://uploads4.wikiart.org/images/marc-chagall/kopeikin-and-napol%C3%A9on.jpg'
Now comes the tricky part. I want to get this string:
http://uploads4.wikiart.org/images/marc-chagall/kopeikin-and-napoléon.jpg
with the non-ascii character in napoléon intact. This is for naming purposes in the storage bucket, not for anything else. How can I produce this string?
Just print the unicode value:
Don't confuse the python representation of the Unicode value (which is deliberately using escapes for non-ASCII characters for ease of debugging and introspection) with the actual value.
Printing encodes the value to the codec used by your console or terminal, provided Python was able to detect it. My terminal is set to UTF-8, so Python encoded the U+00E9 unicode code point to C3 A9 bytes and my terminal then interpreted that as UTF-8 and displayed the
é
.This all just means that you already have the right value, but were thrown by the debugging output.