How can strings with non-ASCII characters be retrieved with OptParse?

Question

How can strings with non-ASCII characters be retrieved with OptParse?

2.1k Views Asked by AudioBubble At 29 October 2012 at 12:48

I'm using the OptParse module to retrieve a string value. OptParse only supports str typed strings, not unicode ones.

So let's say I start my script with:

./someScript --some-option ééééé

French characters, such as 'é', being typed str, trigger UnicodeDecodeErrors when read in the code:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 99: ordinal not in range(128)

I played around a bit with the unicode built-in function, but either I get an error, or the character disappears:

>>> unicode('é');
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
>>> unicode('é', errors='ignore');
u''

Is there anything I can do to use OptParse to retrieve unicode/utf-8 strings?

It seems that the string can be retrieved and printed OK, but then I try to use that string with SQLite (using the APSW module), and it tries to convert to unicode somehow with cursor.execute("..."), and then the error occurs.

Here is a sample program that causes the error:

#!/usr/bin/python
# coding: utf-8

import os, sys, optparse
parser = optparse.OptionParser()
parser.add_option("--some-option")
(opts, args) = parser.parse_args()
print unicode(opts.some_option)

Original Q&A

There are 4 best solutions below

Woot4Moo On 29 October 2012 at 12:56

I believe your error is related to the following:

For example, to write Unicode literals including the Euro currency symbol, the ISO-8859-15 encoding can be used, with the Euro symbol having the ordinal value 164. This script will print the value 8364 (the Unicode codepoint corresponding to the Euro symbol) and then exit:

# -*- coding: iso-8859-15 -*-

currency = u"€"
print ord(currency)

jro On 29 October 2012 at 13:16

You could decode the arguments before the parser handles them. Taking your example:

#!/usr/bin/python
# coding: utf-8
import os, sys, optparse
parser = optparse.OptionParser()
parser.add_option("--some-option")

# Decode the command line arguments to unicode
for i, a in enumerate(sys.argv):
    sys.argv[i] = a.decode('ISO-8859-15')

(opts, args) = parser.parse_args()
print type(opts.some_option), opts.some_option

This gives the following output:

C:\workspace>python file.py --some-option préférer
<type 'unicode'> préférer

I've chose the ISO/IEC 8859-15 code page, as it seems most appropriate to you. Adapt if needed.

lionyue On 29 October 2014 at 08:15

#!/usr/bin/python
# coding: utf-8

import os, sys, optparse

reload(sys)
sys.setdefaultencoding('utf-8')

parser = optparse.OptionParser()
parser.add_option(u"--some-option")
(opts, args) = parser.parse_args()
print opts.print_help()

**Mark Tolonen** · Accepted Answer · 2012-10-30T12:12:51.867000

Mark Tolonen On 30 October 2012 at 12:12 BEST ANSWER

Input is returned in the console encoding, so based on your updated example, use:

print opts.some_option.decode(sys.stdin.encoding)

unicode(opts.some_option) defaults to using ascii as the encoding.

How can strings with non-ASCII characters be retrieved with OptParse?

There are 4 best solutions below

Related Questions in PYTHON

Related Questions in UNICODE

Related Questions in ASCII

Related Questions in APSW

Trending Questions

Popular # Hahtags

Popular Questions