I have a simple python socket server receiving "command" code that is encoded in ASCII. Most bytes are decoded properly with utf-8 by doing data.decode("utf-8"), but for some of them, that converts to some random characters through latin-1.
Here are two examples
byte_string1 = b'\xa3\xb67' # When client sends 67
byte_string2 = b'\xa3\xb6\xa3\xb6' #When client sends 66
I can see the number 67 and 6-6 in the input, but have been unable to extract them out. Is there a proper way to handle these?
My current attempt and I am expecting strings back from data in bytes:
def get_command(data):
try:
command = data.decode("utf-8")
except UnicodeDecodeError as err1:
logger.debug(f"utf-8 UnicodeDecodeError: {err1} for data: {data}")
try:
command = data.decode("latin-1")
except UnicodeDecodeError as err2:
logger.debug(f"latin-1 UnicodeDecodeError: {err2} for data: {data}")
logger.debug(
f"Taking a guess that the bytes are integers, for data: {data}"
)
command = [b for b in data]
return command
server_ip = '0.0.0.0'
server_port = 1234
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.bind((server_ip, server_port))
server_socket.listen(5)
while True:
data = client_socket.recv(1024)
if not data:
break
command = get_command(data)
Your issue is that you're trying to decode a custom byte encoding using standard decoders like UTF-8 and Latin-1. If the byte strings have a specific structure, you should extract the relevant parts manually.
In your case, it appears that the command bytes are encoded in the last part of the byte string. You can slice the byte string to get the relevant bytes.
Here's an optimized version of
get_command():The above function assumes that the first two bytes are always irrelevant for your command decoding.
Update your main loop to incorporate this:
This should solve your problem hopefully.
If the high bit of a byte is used to indicate a new header, you can scan through the byte string to detect these headers and then process the payload bytes accordingly.
Here's a function to do that:
This approach assumes that a new header starts when the high bit is set. Modify as needed.