Invalid continuation byte while reading .txt file

167 Views Asked by At

I'm getting this error in my python code:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 5884: invalid continuation byte

The script is for a dictionary attack using the Crackstation dictionary. I'm trying to make this for fun, but there's a problem when I try to iterate through the items in the dictionary.

pass_file = open(pass_doc, 'r')

for word in pass_file:

pass_doc is a .txt file, NOT .csv. Does it have to be .csv?

I've tried using load_text() instead of open(), but all I want is a simple list of items. What should happen is the code runs through all the items in the dictionary, stored in a list, and I don't know really what's wrong.

1

There are 1 best solutions below

8
Garlic Bread Express On

Make your text file encoded as utf-8 when saving it. If you want to keep the current encoding, try this:

import codecs
BLOCKSIZE = 1048576 # or some other, desired size in bytes
with codecs.open(sourceFileName, "r", "your-source-encoding") as sourceFile:
    with codecs.open(targetFileName, "w", "utf-8") as targetFile:
        while True:
            contents = sourceFile.read(BLOCKSIZE)
            if not contents:
                break
            targetFile.write(contents)

This question might also help: UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c