Python PyODBC and SQL Server encoding issue

53 Views Asked by At

I have json file with values, that have characters like 'č', 'ė', etc. I want to insert those values to SQL Server database, but values I see in database are 'Ä' instead of 'č', 'Ä—' instead of 'ė', 'ÄŒ' instead of 'Č', etc. How should I deal with this problem?

I tried things, that were on StackOverflow and pyodbc wiki, like:

self.connection.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
self.connection.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
self.connection.setencoding(encoding='utf-8')

(also tried latin1 encoding, because I saw somewhere, that it could help).

I tried adding charset to connection string:

connection_string = (
    f'DRIVER={{ODBC Driver 18 for SQL Server}};'
    f'SERVER={server};'
    f'DATABASE={database};'
    'Trusted_Connection=yes;'
    'Encrypt=no;'
    'CHARSET=UTF-8;'
)

But nothing solved my problem.

This is how I read my file:

def read_values(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        return json.load(file)

This is how I insert my data:

def insert(self, table_name: str, column_values: dict):
    columns = ', '.join(column_values.keys())
    values = ', '.join([f"'{value}'" for value in column_values.values()])

    query = f"INSERT INTO {table_name} ({columns}) VALUES ({values})"
    print(query)

    try:
        cursor = self.connection.cursor()
        cursor.execute(query)
        cursor.execute("SELECT @@IDENTITY AS ID;")
        last_inserted_id = cursor.fetchone()[0]
        self.connection.commit()
        print("Data inserted successfully.")
        return last_inserted_id
    except Exception as e:
        print(f"Error inserting data: {e}")
    finally:
        cursor.close()

print(query) prints the right query, with right characters.

If I write the right characters directly into the database with my keyboard, everything works fine.

1

There are 1 best solutions below

0
Gord Thompson On

Issue 1: Don't use .setencoding() and .setdecoding() for SQL Server.

As mentioned in the pyodbc wiki entry for SQL Server, they are not necessary.

Issue 2: Don't use string formatting (a.k.a. dynamic SQL) to inject column values into an SQL statement.

The real problem you are having is that you are constructing an INSERT statement that looks something like this:

print(query)
# INSERT INTO so78154305 (firstname) VALUES ('Renée')

SQL Server uses 'string' for varchar and N'string' for nvarchar, so a Unicode literal should be N'Renée'.

But that would still be bad form. Use a parameterized query instead. For example

columns = ', '.join([f"[{key}]" for key in column_values.keys()])
placeholders = ', '.join("?" * len(column_values.values()))

query = f"INSERT INTO [{table_name}] ({columns}) VALUES ({placeholders})"
print(query)
# INSERT INTO [so78154305] ([firstname], [active]) VALUES (?, ?)

cursor.execute(query, tuple(column_values.values()))