I'm struggling to deal with what I think is corrupt data stored in an Oracle database when reading it in a python script. I have the following:
def output_type_handler(cursor, metadata):
if metadata.type_code is oracledb.DB_TYPE_VARCHAR:
return cursor.var(metadata.type_code,
arraysize=cursor.arraysize,
encoding_errors="replace")
mydb, mycursor = connectOracle()
mycursor.outputtypehandler = output_type_handler
mycursor.execute('''select file_id, md5_value,
case when file_content is not null
then
utl_raw.cast_to_varchar2(dbms_lob.substr(file_content))
else
''
end as FILE_CONTENT from archive_file where archive_id = 123 and file_name = 'file_name.txt' ''')
row = mycursor.fetchone()
print(row)
When I run this, I'm getting the following:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 647: invalid continuation byte
If I run this directly in SQL Developer, I get an output, with � in the file content. My SQL Developer settings are set to UTF-8, and my Oracle database NLS_NCHAR_CHARACTERSET
is set to AL32UTF8
. My understanding with the outputtypehandler
change is that my python script would also output a � in this instance.
I added some print()
commands inside of the def output_type_handler()
to verify that it is indeed being called, and I saw output, so it appears that it is, bu it also seems like the encoding_errors="replace"
is being ignored. What am I missing here? Thanks!