Python ver. 3.11.5 on Windows 10
I have a directory filled with .gz text archives. To scan these archives, I use the following python code:
with gzip.open(logDir+"\\"+fileName, mode="rb") as archive:
for filename in archive:
print(filename.decode().strip())
All used to work, however, the new system adds lines similar to this:
:§f Press [§bJ§f]
Python gives me this error:
File "C:\Users\Me\Documents\Python\ConvertLog.py", line 16, in readZIP print(filename.decode().strip())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa7 in position 49: invalid start byte
Anyone know a way of dealing with strange characters that pop up? I can't just ignore the line. This happens to be one of the few lines I need to strip out and write to a condensed report.
I tried other modes, besides "rb". I really have no idea what else to try.
You can use different options for how to handle errors and using
decode()a bit differently, which you can read more about in the documentation.In
decode, you case specifyerrors='strict',errors='ignore', orerrors='replace'. If unspecified,strictis the default, and will throw an error when it finds itself in a situation like yours.ignorewill simply ignore the invalid characters.replacereplaces the character with a "suitable replacement character."So, one way this might be implemented could be: