Is there a better way to represent uint8 data

5.2k Views Asked by At

I have a question regarding the representation of the uint8 data.

I have a MATLAB MAT file where uint8 data are saved.

The MAT file is reading into Python using scipy.io.loadmat() and a dictionary is formed. The array corresponding to the data filed of the MAT file is extracted from the dictionary. The array looks like:

array[[162],[122],...[135],dtype:uint8]

After that, this array is saved into txt format for later file reading. However, the data in the txt file is double precision. For example, uint8 data 162 is saved as 1.620000000000000000e+02. This is not what i want, because it would occupy too much memory.

What I need is that each one byte from source_file.read() corresponds to one uint8 data.

Is there a better way to represent the uint8 data? Convert uint8 to string is possible but still take 2-3 bytes.

2

There are 2 best solutions below

1
On

You're presumably using numpy.savetxt, which has default fmt argument '%.18e', meaning "in exponential format with 18 digits of precision."

You could change it to something that'll spit out an integer (e.g. fmt='%d'), but that's still quite inefficient in terms of file space usage (since it's an ASCII-encoded integer).

numpy.save is in a much more efficient binary format, which is much closer to what you're asking for, though it includes some headers (format description). If you want just the binary data then tostring as suggested by dbaupp is the way to go:

with open('outfile', 'wb') as f:
    f.write(the_array.tostring())
0
On

Numpy has tostring() and fromstring() which convert between a ndarray data structure and a binary string. E.g.

> a = numpy.array([162,122,135], dtype=numpy.uint8)
> a.tostring()
'\xa2z\x87'

(That string is ['\xa2', 'z', '\x87'], the \x.. represents a single byte.)