Process mixed bytes data into python list

141 Views Asked by At

I am reading data remote .dat files for EDI data processing.

Original Data is some string bytes:

b'MDA1MDtWMjAxOS44LjAuMDtWMjAxOS44LjAuMDsyMDIwMD.........'

Used decode as below...

byte_data = base64.b64decode(byte_data)

Gave me this below byte data. Is there a better way to process below bytes data into python list ?

b"0050;V2019.8.0.0;V2019.8.0.0;20200407;184821\r\n0070;;7;0;7\r\n0080;11;50;bot.pdf;Driss;C:\\Dat\\Abl\\\r\n0090;1;Z;Zub\xf6r;0;0;0;Zub\xf6r;;;Zub\xf6r\r\n

Tried decode with uft-8, didn't work.

byte_data.decode('utf-8')

Tired to convert to string and read as CSV but did not help, landed on original data. Need to keep some of the string as it is and convert \xf6r \r \n

data = io.StringIO(above_data)
data.seek(0)
csv_reader = csv.reader(data, delimiter=";")
2

There are 2 best solutions below

1
mugiseyebrows On BEST ANSWER

It didn't work with 'utf-8' because it's not 'utf-8', it's probably 'ISO-8859-1' (latin-1)

text = byte_data.decode('ISO-8859-1')

because \xf6 is ö in 'ISO-8859-1'

1
Amiga500 On

Is it definitely utf-8 encoded?

This might help guide to what decoder to use:

import chardet
print(cardet.detect(byte_data))