printing unicode for bengali

3.9k Views Asked by At

I'm using goslate for google translate API

I can translate Bengali to Engliash -

>>> import goslate

>>> gs = goslate.Goslate()
>>> S = gs.translate("ভাল", 'en')
>>> S

good

But, problem in arising when I want to translate English to Bengali.

>>> import goslate

>>> gs = goslate.Goslate()
>>> S = gs.translate("good", 'bn')
>>> S

Eoor:

return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-2:     character maps to <undefined>

What should I do?

print repr(S)
output: u'\u09ad\u09be\u09b2'

print("ভাল")
output: ভাল

print(u"ভাল") # this gives UnicodeEncodeError
2

There are 2 best solutions below

19
On BEST ANSWER

This works for me

#coding: utf-8

from sys import setdefaultencoding, getdefaultencoding

d=getdefaultencoding()
if d != "utf-8":
    setdefaultencoding('utf-8')
st="ভাল"
f=open('test.txt','w')
f.write(st.encode('utf-8'))
f.close()
if d != "utf-8":
    setdefaultencoding(d)

This prints "ভাল" as expected. print st.encode('utf-8') works too.

0
On

It is definitely unrelated to goslate. Your issue is to make print u'\u09ad\u09be\u09b2' to work when the Unicode characters can't be represented using the console character encoding.

You either need to change the encoding to the one that can represent the Unicode characters such as utf-8 or use Unicode API such as WriteConsoleW assuming you are on Windows -- if you are not on Windows then just configure your environment to use utf-8.

WriteConsoleW usage is complicated though there is a simple to use win_unicode_console package on Python 3. The latter link also shows how to save the printed Unicode text to a file (print Unicode, set PYTHONIOENCODING).