I am writing a python program to replace a C program which, amongst other things, received data from a microcontroller. This was done in C using a simple socket and the read function. In my python program I am able to read in a string of data from the micro controller, but I can't seem to get it into a readable format. I have captured this string and written a much smaller program which attempts to convert it into just a list of numbers:
import array
import thread
import socket
import time
import math
import numpy as np
import struct
data = open("rawfile.txt", 'r')
conv = open("conv.bin", 'wb')
pack = open("pack.txt", 'w')
# This line reads in the data string from the file
rawdata = data.read()
length = len(rawdata)
unpack = np.zeros(length, dtype=np.int64)
inter = np.int64
size = 4
m=0
for n in range(0,length-size):
inter = struct.unpack_from('h',rawdata,n)
unpack[m] = inter[0]
m=m+1
n=n+4
conv.write(unpack)
for j in range(0,len(unpack)):
#print unpack[j]
stringtowrite = str(unpack[j])
pack.write(stringtowrite)
pack.write(',')
#conv.write(dat2)
print "Done"
Here is the data which this program produces (plotted in Matlab), and what the data should look like: (the cleaner pulse is what it should look like)
Any help would be greatly appreciated, I've been struggling with this for weeks. I can upload the raw data file, but I wasn't sure how to do this as its quite large.
So in summary, my question is why does the program produce the data for the first image and not the second, and is there anything obviously wrong with the way I'm reading in and converting the data.
Thanks in advance !
EDIT/UPDATE:
Thanks to the great answers below, i am now using this:
dt = np.dtype('int16')
unpack = np.zeros(302000, dtype=dt)
unpack = np.fromfile(data, dtype=dt)
conv.write(unpack)
and the data looks better! The first image is with dtype('int16') and the second is with dtype('int32'). I have also found out that the data I'm reading is comprised of alternating real/imaginary numbers, should this change the format string I use for the numpy dtype ? As far as I'm aware there was no step in the C code which accounted for this.
Final Update:
To avoid confusion for anyone else, the second two images above are reading the data in correctly, it was an issue in the actual data causing them to not look right.
There are multiple problems here.
You are iterating over bytes of data, but interpreting that data as 2 byte units. The
offset
is the offset in bytes, not the offset in units offmt
. So the data is laid out like so (number being the index of the integer):But you are reading the data like this:
Rather than like this:
You would need to use
range(0, length-size, 2)
to iterate in steps of 2.However, Python ranges are half-open, they exclude the last value. So you are dropping the last sample currently. If you don't mean to, leave off the
-size
.This is not the idiomatic way to loop over bytes, however. The better way would be to iterate over them directly:
This would iterate over pairs of bytes.
But this is a roundabout way to do this if you are already using numpy. numpy has a very fast way to unpack an entire array from binary data: fromstring. You can just read in the data all at once like so:
Also, your
size
or format is wrong,h
is 2 bytes butsize
is 4. Also, you are shadowing the builtinsize
function. Use a different variable name.Also, this is being handled as binary data, so you should open the file as binary. If you do this, you can use numpy's fromfile in the same way, but without having to do the
read
first, which would be even faster: