When I use the next code in Python:
line = "áaáaáaá"
for c in line:
print c
My out is:
� � a � � a � � a � �
How can I fix this?
When I use the next code in Python:
line = "áaáaáaá"
for c in line:
print c
My out is:
� � a � � a � � a � �
How can I fix this?
I tried the following on python interpretor to understand , hope this findings helps you !
\> line = "áaáaáaá" \> line '\xc3\xa1a\xc3\xa1a\xc3\xa1a\xc3\xa1'
This entire line was store as a utf-16 . Note á
is converted into \xc3\xa1
line = "áaáaáaá"
for c in line:
print c
The split of line happens like this - '\xc3' , '\xa1', 'a' , '\xc3' ....
and this the output is something like � � a � � a � � a � �
So if you specify something like this -
\> line = unicode("áaáaáaá", encoding="utf-8")
\> line
u'\xe1a\xe1a\xe1a\xe1'
This will encode the unicode value of all characters in single byte itself.
Now the split of line happens like this - '\xe1', a, '\xe1', 'a', '\xe1', 'a', ...
and output is something like áaáaáaá
I've googled a bit on this problem, i found something here:
http://eclipsesource.com/blogs/2013/02/21/pro-tip-unicode-characters-in-the-eclipse-console/
Try going from the Launch Configuration dialog > Common > and set the encoding to
utf-8
orlatin-1
.If this doesn't solve the problem, try converting each character to
utf-8
format and then print it:Edit: Here's some explanation :)
When you don't specify the encoding as
utf-8
, the interpreter breaks it down in wrong parts. For example,à
is stored as '\xc3\xa1`. In the loop, python thinks of it as two separate characters:It thinks of
\xc3\xa1
as two chars, which is:Why does it works when you specify the encoding, then? Well, i'm sure you got it already. When you set the encoding to
utf-8
, it treats the string with the format ofutf-8
, and it knows that\xc3\xa1
is one character.Well, in my second method, it would work even if you don't set the encoding to
utf-8
. Why? Because this:converts the encoding from
utf-8
to what your interpreter uses.Hope this helps!