'charmap' codec can't encode character '\x92' in position 0: character maps to <undefined>

516 Views Asked by At

I am trying to pass some functions from C++ to Python using the Qt library (Pyside2 in python). At the moment everything works correctly passing the code from one side to the other and adapting it to Python, but when I start treating some files "translation" I got differents results and sometimes errors.

I should get this:

Result 1

but I get this instead:

Result 2

or I make every byte I append to the array chr() I get this:

Result 3

I am quite a newbie dealing with bytes and bytearrays, so I don't know If I have to save every result I get from the algorithm decoded or If I have to save every byte in the Bytearray and then decode it when It is completed. If try this last option, I get an "OverFlow" error without more context in this part decryptedFile.append(currentByte ^ 0x33)

I would like to fix this code to work correctly. Thank you all!

This is the original function in C++:

QByteArray NosTextDatFileDecryptor::decrypt(QByteArray &array) {
    QByteArray decryptedFile;
    int currIndex = 0;
    while (currIndex < array.size()) {
        unsigned char currentByte = array.at(currIndex);
        currIndex++;
        if (currentByte == 0xFF) {
            decryptedFile.push_back(0xD);
            continue;
        }
        int validate = currentByte & 0x7F;
        if (currentByte & 0x80) {
            for (; validate > 0; validate -= 2) {
                if (currIndex >= array.size())
                    break;
                currentByte = array.at(currIndex);
                currIndex++;
                int firstByte = cryptoArray.at((currentByte & 0xF0) >> 4);
                decryptedFile.push_back(firstByte);
                if (validate <= 1)
                    break;
                int secondByte = cryptoArray.at(currentByte & 0xF);
                if (!secondByte)
                    break;
                decryptedFile.push_back(secondByte);
            }
        } else {
            for (; validate > 0; --validate) {
                if (currIndex >= array.size())
                    break;
                currentByte = array.at(currIndex);
                currIndex++;
                decryptedFile.push_back(currentByte ^ 0x33);
            }
        }
    }
    return decryptedFile;
}

And this is my code for the Python version of the project:

from PySide2.QtCore import QByteArray

def dat_file_decryptor(array):
    decryptedFile = QByteArray()
    currIndex = 0
    while currIndex < array.size():
        currentByte = ord(array[currIndex]) #unsigned char
        currIndex += 1
        if currentByte == 0xFF:
            decryptedFile.append(0xD)
            #pass
        validate = currentByte & 0x7F
        if currentByte & 0x80:
            while validate > 0:
                if currIndex >= array.size():
                    break
                currentByte = ord(array[currIndex])
                currIndex += 1
                firstByte = cryptoArray[(currentByte & 0xF0) >> 4]
                decryptedFile.append(firstByte)
                if validate <= 1:
                    break
                secondByte = cryptoArray[currentByte & 0xF]
                if not secondByte:
                    break
                decryptedFile.append(secondByte)
                validate -= 2
        else:
            while validate > 0:
                if currIndex >= array.size():
                    break
                currentByte = ord(array[currIndex])
                currIndex +=1
                decryptedFile.append(chr(currentByte ^ 0x33)) #If I don't use chr() here I get an OverFlow error
                validate -= 1
    return decryptedFile

If you want to try it yourself this are the data you will need:

array = b'\n\x10\x13rPG\x13wRGR\xff$\x10\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\xff\x04wRGR\x8
9\x15\x15\x15\x15@\xff\x04wRGR\x88\x16\x15\x16\x1a\xff\x04wRGR\x88\x17\x15\x17\x1c\xff\x04wRGR\x89\x18\x15\x18\x15@\xff\x04wRGR\x88\x19\x15\x19\x1a\xff\x04wRGR\x88\x1a\x15\x1a\x1a\xff\x04
wRGR\x88\x1b\x16\x15\x17\xff\x04wRGR\x88\x1c\x16\x16\x16\xff\x04wRGR\x88\x1d\x16\x17\x17\xff\x04wRGR\x89\x15Aa\x81p\xff\x04wRGR\x89\x15Qa\x91P\xff\x04wRGR\x89\x15aa\xa1P\xff\x04wRGR\x89\x
15qqQ`\xff\x04wRGR\x89\x15\x81qap\xff\x04wRGR\x89\x15\x91qq`\xff\x04wRGR\x89\x15\xa1q\x81p\xff\x04wRGR\x89\x15\xb1q\x91p\xff\x04wRGR\x89\x15\xc1q\xa1`\xff\x04wRGR\x89\x15\xd1\x81QP\xff\x0
4wRGR\x89\x16A\x91Q`\xff\x04wRGR\x89\x16Q\xa1QP\xff\x04wRGR\x8a\x16a\xb1QT\xff\x04wRGR\x89\x16q\xb1a\x90\xff\x03V]W\xff$\x10\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0
e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x0e\x10\xff\x07\x10\x13gZG_V\xff\tr:\x02:IG@\x02V\xff\tr:\x01:IG@\x01V\xff\tr:\x00:IG@\x00V\xff\tr:\x07:IG@\x07V\xff\
tr:\x06:IG@\x06V\xff\x07r:\x05:IG@\x84v\xd4\x01V\xff\x07r:\x04:IG@\x84v\xd5\x01V\xff\x01M\xff'
1

There are 1 best solutions below

1
On BEST ANSWER

It seems the problem is that you have missed translating the C++ continue statement into Python.

Replace the commented out line

            #pass

with

            continue

as Python also has a continue statement, and your code should work.

Within a loop, a continue statement causes the rest of the current iteration of the loop to end, and execution resumes from the next iteration.

It seems the code is doing some kind of decompression/decoding on the incoming data, and it can operate in either two modes:

  • 'XOR' mode, where characters from the input are XOR-ed with 0x33 before being appended to the output,
  • 'Decompression' mode, where characters are read from cryptoArray, which is assumed to contain 16 regularly-appearing characters. The top four and bottom four bits of each byte is an index into cryptoArray corresponding to the character to output.

The decoding proceeds first by reading a byte that tells it which mode to use and how many bytes to read in that mode. The top bit of this byte is clear to use XOR mode and set to use decompression mode, and the remaining seven bits give the number of bytes to read in that mode. The special character \xff indicates that a linebreak should be output instead.

Your source data starts with \n\x10\x13rPG\x13wRGR, and this says to read 10 bytes in XOR mode (\n is character 10). XORing the 10 bytes \x10\x13rPG\x13wRGR with 0x33 gives you the text # Act Data.

Later on in your data, there is the sequence \x89\x15\x15\x15\x15@. This says to read 9 characters in decompression mode, at indexes 1, 5, 1, 5, 1, 5, 1, 5 and 4 in cryptoArray. (@ is character \x40.) This corresponds to 1 1 1 10 from your output, so from this we can deduce that the characters at indexes 1, 5 and 4 in cryptoArray are space, 1 and 0 respectively.

You didn't specify the contents of cryptoArray in your question, but the following seemed to produce the correct output for me:

#                          0123456789abcdef
cryptoArray = QByteArray(b'X XX0123456789XX')

The Xs signify bytes that weren't used in this particular conversion, so I can't say what they are in your code.

So why did your code generate incorrect output? The missing continue statement is in the handling for the \xff byte, which indicates to output a linebreak. Without a continue statement, the code would output a linebreak and then incorrectly attempt to decode 127 characters from the next 64 bytes of your data in decompression mode.

Anyway, your function dat_file_decryptor returns a QByteArray. I assigned it to a variable result, and I could get the desired output from your source data by running the following line:

print(bytes(result).decode("ascii").replace("\r", "\n"))

Disclaimer: I haven't tested this in PySide, only PyQt5. I don't know if that makes a difference.