Trouble reading in Unicode strings from CSV file to DictReader in Python

1.4k Views Asked by At

I have a CSV file I'm trying to read in using DictReader.

But doing just this:

with("BeerRatings.csv", "r", "utf-8") as f:
    reader = csv.DictReader(f)
    for line in reader:
        print line

gives me some ugly unicode as such:

{'Rating': '4', 'Brewery': 'Tr\xc3\xb6egs Brewing Company', 'Beer name': 'Tr\xc3\xb6egs Hopback Amber Ale'}
{'Rating': '4.59', 'Brewery': 'Brasserie Dieu Du Ciel', 'Beer name': 'P\xc3\xa9ch\xc3\xa9 Mortel - Bourbon Barrel Aged'} etc.

So, reading on stackoverflow, I editted my code to this, using the codecs module:

import codecs

with codecs.open("BeerRatings.csv", "r", "utf-8") as f:
    reader = csv.DictReader(f)
    for line in reader:
        print line

But this is giving me a UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 9: ordinal not in range(128).

Any tips on how to go fix this?

UPDATE aka more flailing around:

def UnicodeDictReader(utf8_data, **kwargs):
    csv_reader = csv.DictReader(utf8_data, **kwargs)
    for row in csv_reader:
        yield {key: unicode(value, 'utf-8') for key, value in row.iteritems()}

with open("BeerRatings.csv", "r") as f:
    reader = UnicodeDictReader(f)
    for line in reader:
        print line

THis still gives me a less than ideal output...

{'Rating': u'4', 'Brewery': u'Tr\xf6egs Brewing Company', 'Beer name': u'Tr\xf6egs Hopback Amber Ale'}
{'Rating': u'4.59', 'Brewery': u'Brasserie Dieu Du Ciel', 'Beer name': u'P\xe9ch\xe9 Mortel - Bourbon Barrel Aged'}
1

There are 1 best solutions below

8
On BEST ANSWER

The csv module in Python 2.X expects the input file to be opened in binary, and does not support encodings. It is, however, compatible with UTF-8, but you have to decode to Unicode yourself:

import csv

with open('BeerRatings.csv','rb') as f:
    reader = csv.DictReader(f)
    for line in reader:
        for k,v in line.iteritems():
            print k.decode('utf8'),':',v.decode('utf8')
        print

Output:

Rating : 4
Brewery : Tröegs Brewing Company
Beer name : Tröegs Hopback Amber Ale

Rating : 4.59
Brewery : Brasserie Dieu Du Ciel
Beer name : Péché Mortel - Bourbon Barrel Aged

Edit

Per your UnicodeDictReader, you still need to print the key/value pairs as I did or you get the default printing for a dict, which shows escaped data via the repr() of the string. Also open in binary mode. It matters on some OSes, particularly Windows.

import csv

def UnicodeDictReader(utf8_data, **kwargs):
    csv_reader = csv.DictReader(utf8_data, **kwargs)
    for row in csv_reader:
        yield {key.decode('utf8'):value.decode('utf8') for key, value in row.iteritems()}

def prettydict(D):
    return u'{' + u', '.join(u"'{}': '{}'".format(k,v) for k,v in D.iteritems()) + u'}'

with open("BeerRatings.csv", "rb") as f:
    reader = UnicodeDictReader(f)
    for line in reader:
        print prettydict(line)

Output:

{'Rating': '4', 'Brewery': 'Tröegs Brewing Company', 'Beer name': 'Tröegs Hopback Amber Ale'}
{'Rating': '4.59', 'Brewery': 'Brasserie Dieu Du Ciel', 'Beer name': 'Péché Mortel - Bourbon Barrel Aged'}