Python encoding/decoding error for 'cp866'

819 Views Asked by At

6.5 and I am trying to extract some information from a CSV file, but the file is written in Russian, so I need to use 'cp866' to decode that. However, I can't get the correct output.

This is the code that I use:

def printcsv():
    with open('vocabulary.csv',newline='') as f:
      reader = csv.reader(f)
      for row in reader:
          #store in array
          print(row.decode('cp866'))

This is the error that I got:

"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa7 in position 0: ordinal not in range(128)
1

There are 1 best solutions below

0
On

Oups, that not the correct way to read an encoded csv file. Here is what you try to do:

with open('vocabulary.csv',newline='') as f: # open the file with default system encoding
  reader = csv.reader(f)                     # declare a reader on it
  for row in reader:                         # here comes the problem

I assume that your system uses ASCII for the default encoding. So when the reader tries to load a row, a line (of bytes) is read from the file and decoded to a string with the default ascii encoding.

And anyway, row is a list and not a string, so row.decode would have raised an error if you had reached that line.

The correct way if the specify encoding of file when opening it:

def printcsv():
    with open('vocabulary.csv',newline='', encoding='cp866') as f:
      reader = csv.reader(f)
      for row in reader:
          #store in array

But I am unsure for the

          print(row)

Depending on what encoding is used by sys.stdout, you could have to explitely encode each string from the array:

          print([ field.encode(encoding) for field in row ])