oracledb python encoding errors in outputtypehandler

87 Views Asked by At

I'm struggling to deal with what I think is corrupt data stored in an Oracle database when reading it in a python script. I have the following:

    def output_type_handler(cursor, metadata):
      if metadata.type_code is oracledb.DB_TYPE_VARCHAR:
        return cursor.var(metadata.type_code,
                          arraysize=cursor.arraysize,
                          encoding_errors="replace")

    mydb, mycursor = connectOracle()
    mycursor.outputtypehandler = output_type_handler

    mycursor.execute('''select file_id, md5_value,
                            case when file_content is not null
                                then
                                    utl_raw.cast_to_varchar2(dbms_lob.substr(file_content))
                                else
                                    ''
                                end as FILE_CONTENT from archive_file where archive_id = 123 and file_name = 'file_name.txt' ''')
    row = mycursor.fetchone()
    print(row)

When I run this, I'm getting the following:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 647: invalid continuation byte

If I run this directly in SQL Developer, I get an output, with � in the file content. My SQL Developer settings are set to UTF-8, and my Oracle database NLS_NCHAR_CHARACTERSET is set to AL32UTF8. My understanding with the outputtypehandler change is that my python script would also output a � in this instance.

I added some print() commands inside of the def output_type_handler() to verify that it is indeed being called, and I saw output, so it appears that it is, bu it also seems like the encoding_errors="replace" is being ignored. What am I missing here? Thanks!

0

There are 0 best solutions below