UnicodeDecodeError: 'utf-16-le'

33.4k Views Asked by At

I'm trying to pull a list of tables from an access database. The script goes through and displays about a third of the tables and gets the following error:

Traceback (most recent call last): File "C:/GageInfo/sourcecode for GageTrakNotify/__test script.py", line 31, in for fld in cursor2.columns(rows.table_name): UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 132-133: illegal encoding

Any tips would be appreciated.

import pyodbc


cursor1 = conn.cursor()
cursor2 = conn.cursor()

tblCount = 0
for rows in cursor1.tables():
   if rows.table_type == "TABLE":
       tblCount += 1
       print(rows.table_name)
       for fld in cursor2.columns(rows.table_name):
           print(fld.table_name, fld.column_name)

conn.close()
4

There are 4 best solutions below

0
On

Thanks for the tips and feedback. I resolved it by capturing the information in a list and then printing out results, and using a try statement to handle the error and apply the right unicode.

Your suggestions pointed me in the right direction.

0
On

It sounds like you have a Unicode (non-ASCII) embedded in a table name somewhere. Figuring out what table it's stopping on will confirm whether that's the case or not, but ultimately you want the Python script to just handle Unicode, which can be done with the string decode method:

for fld in cursor2.columns(rows.table_name.decode('utf-16-le')):

This should be done anywhere there is a possibility of Unicode characters cropping up (for instance, the column names as well).

0
On

This is almost certainly caused by a known issue with cursor.columns() when using the Access ODBC driver if the table metadata includes "Descriptions" for the columns. The following GitHub issue includes a discussion of the problem and possible workarounds:

https://github.com/mkleehammer/pyodbc/issues/328

TL;DR: The Access ODBC driver returns trailing garbage bytes under those circumstances and some combinations of the bytes will not be valid UTF-16LE encoded characters. A pyodbc output converter function can be used to intercept the byte string and decode it without throwing an exception. The specific workaround is described here:

https://github.com/mkleehammer/pyodbc/issues/328#issuecomment-419655266

0
On

This is my first answer, I apologize if it is very generic.

Set decoding with:

conn.setdecoding(pyodbc.SQL_WCHAR, encoding='latin-1')

Place it before conn.cursor(), also, you might be missing it:

conn_str = (
    r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};"
    r"DBQ=path\to\your\access\database.accdb;"
    r"unicode_results=True;"
)
conn = pyodbc.connect(conn_str)