Reading binary data in python (to replace C code)

481 Views Asked by At

I am writing a python program to replace a C program which, amongst other things, received data from a microcontroller. This was done in C using a simple socket and the read function. In my python program I am able to read in a string of data from the micro controller, but I can't seem to get it into a readable format. I have captured this string and written a much smaller program which attempts to convert it into just a list of numbers:

import array
import thread
import socket
import time
import math
import numpy as np
import struct

data = open("rawfile.txt", 'r')
conv = open("conv.bin", 'wb')
pack = open("pack.txt", 'w')


# This line reads in the data string from the file
rawdata = data.read()

length = len(rawdata)

unpack = np.zeros(length, dtype=np.int64)
inter = np.int64

size = 4
m=0

for n in range(0,length-size):
    inter = struct.unpack_from('h',rawdata,n)
    unpack[m] = inter[0]
    m=m+1
    n=n+4

conv.write(unpack)
for j in range(0,len(unpack)):
    #print unpack[j]
    stringtowrite = str(unpack[j])
    pack.write(stringtowrite)
    pack.write(',')

#conv.write(dat2)
print "Done"

Here is the data which this program produces (plotted in Matlab), and what the data should look like: (the cleaner pulse is what it should look like)

Data from program What the data should look like

Any help would be greatly appreciated, I've been struggling with this for weeks. I can upload the raw data file, but I wasn't sure how to do this as its quite large.

So in summary, my question is why does the program produce the data for the first image and not the second, and is there anything obviously wrong with the way I'm reading in and converting the data.

Thanks in advance !

EDIT/UPDATE:

Thanks to the great answers below, i am now using this:

dt = np.dtype('int16')
unpack = np.zeros(302000, dtype=dt)
unpack = np.fromfile(data, dtype=dt)

conv.write(unpack)

and the data looks better! The first image is with dtype('int16') and the second is with dtype('int32'). I have also found out that the data I'm reading is comprised of alternating real/imaginary numbers, should this change the format string I use for the numpy dtype ? As far as I'm aware there was no step in the C code which accounted for this.

Final Update:

To avoid confusion for anyone else, the second two images above are reading the data in correctly, it was an issue in the actual data causing them to not look right.

int16 int32

2

There are 2 best solutions below

2
On BEST ANSWER

There are multiple problems here.

You are iterating over bytes of data, but interpreting that data as 2 byte units. The offset is the offset in bytes, not the offset in units of fmt. So the data is laid out like so (number being the index of the integer):

1 1 2 2 3 3 4 4 5 5 6 6

But you are reading the data like this:

1 1  
  1 2
    2 2
      2 3
        3 3
          3 4
            4 4
              4 5
                5 5
                  5 6
                    6 6

Rather than like this:

1 1
    2 2
        3 3
            4 4
                5 5
                    6 6

You would need to use range(0, length-size, 2) to iterate in steps of 2.

However, Python ranges are half-open, they exclude the last value. So you are dropping the last sample currently. If you don't mean to, leave off the -size.

This is not the idiomatic way to loop over bytes, however. The better way would be to iterate over them directly:

for idata in rawdata[::2]:

This would iterate over pairs of bytes.

But this is a roundabout way to do this if you are already using numpy. numpy has a very fast way to unpack an entire array from binary data: fromstring. You can just read in the data all at once like so:

unpack = np.fromstring(data.read(), dtype=np.int16)

Also, your size or format is wrong, h is 2 bytes but size is 4. Also, you are shadowing the builtin size function. Use a different variable name.

Also, this is being handled as binary data, so you should open the file as binary. If you do this, you can use numpy's fromfile in the same way, but without having to do the read first, which would be even faster:

unpack = np.fromfile(data, dtype=np.int16)
0
On

Since the actual data looks quantized vertically, you probably swapped the byte order.

Try unpack[m] = (inter[0] % 0x100) * 0x100. If that looks nice, you have to unpack the data using the different byte order.