I have json file with values, that have characters like 'č', 'ė', etc. I want to insert those values to SQL Server database, but values I see in database are 'Ä' instead of 'č', 'Ä—' instead of 'ė', 'ÄŒ' instead of 'Č', etc. How should I deal with this problem?
I tried things, that were on StackOverflow and pyodbc wiki, like:
self.connection.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
self.connection.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
self.connection.setencoding(encoding='utf-8')
(also tried latin1 encoding, because I saw somewhere, that it could help).
I tried adding charset to connection string:
connection_string = (
f'DRIVER={{ODBC Driver 18 for SQL Server}};'
f'SERVER={server};'
f'DATABASE={database};'
'Trusted_Connection=yes;'
'Encrypt=no;'
'CHARSET=UTF-8;'
)
But nothing solved my problem.
This is how I read my file:
def read_values(file_path):
with open(file_path, 'r', encoding='utf-8') as file:
return json.load(file)
This is how I insert my data:
def insert(self, table_name: str, column_values: dict):
columns = ', '.join(column_values.keys())
values = ', '.join([f"'{value}'" for value in column_values.values()])
query = f"INSERT INTO {table_name} ({columns}) VALUES ({values})"
print(query)
try:
cursor = self.connection.cursor()
cursor.execute(query)
cursor.execute("SELECT @@IDENTITY AS ID;")
last_inserted_id = cursor.fetchone()[0]
self.connection.commit()
print("Data inserted successfully.")
return last_inserted_id
except Exception as e:
print(f"Error inserting data: {e}")
finally:
cursor.close()
print(query) prints the right query, with right characters.
If I write the right characters directly into the database with my keyboard, everything works fine.
Issue 1: Don't use
.setencoding()and.setdecoding()for SQL Server.As mentioned in the pyodbc wiki entry for SQL Server, they are not necessary.
Issue 2: Don't use string formatting (a.k.a. dynamic SQL) to inject column values into an SQL statement.
The real problem you are having is that you are constructing an INSERT statement that looks something like this:
SQL Server uses
'string'for varchar andN'string'for nvarchar, so a Unicode literal should beN'Renée'.But that would still be bad form. Use a parameterized query instead. For example