Python - Unpacking a DAT file with Hex values and Incompletely

51 Views Asked by At

I am trying to unpack a binary file (file.dat) but it is just unpacking one single row while it has more than 800, also when I print the result of struct.unpack(), the string field is giving me lots of HEX values, is there a way to remove those Hex values and other unwanted characters (like 'N')? I have the actual following code:

with open(f'file.dat', mode="rb") as binFile:
    binData = binFile.read(26)
    data = struct.unpack('i22s', binData)
    print(data)

binFile.close()

It gives me the following result, the integer value is correct but the string needs to be only 'PROV111'. Also as you can see it reads only one record:

(1000000054, b'N\x07\x00PROV111\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
1

There are 1 best solutions below

0
gimix On

The answers are already in the comments, but let's make them more explicit.

You are reading a standard size (i.e. 32 bits) signed int followed by a 22 char string, but your data are in a different format: after the intial int you have a 'N' (you may know why), a short (16 bits) signed int which represents the actual string lenght, and finally your string, zero-padded.

So you should read your struct and then extract the actual string with something like:

#the format string looks for an int, a dummy char, a short int, the string
number, dummy, str_len, padded_str = struct.unpack('ish19s', binData)
true_str = padded_str[:str_len]
print(number, true_str)

And yes, of course you should loop until EOF to read all the bytes chunks in your file.