Python cbor2 encode float in preferred (efficient) format

305 Views Asked by At

The CBOR docs state that the most efficient (less number of bytes) encoding should be preferred.

floats can be encoded as 64-bit floats, or with extensions as 32-bit, 16-bit, BigFloat or DecimalFloat formats.

Stanards 64-bit encoding uses 9 bytes. Some floating values can take much less space if using an alterantive format (e.g the values 0.0, 1.0, 1.5 can be represented as 4 bytes using BigFloats).

Some values are better represented as standard floats (e.g. 0.123456789 is represented by 9 bytes as 64-bit float or 29 bytes with BigFloats.

The cbor2 python library supports BigFloats if using the Decimal type, or the float if using the float type.

How can I get cbor2 to automatically emit the most efficient type depending on the actual value?

I have tried various arbitrary values using cbor2.dumps(). floats are always encoded as CBOR floats, and Decimal types are alwasy encoded as CBOR BigFloats.

>>> x=0.0 ; x ; d1 = dumps(x) ; d1 ; len(d1) ; dx = Decimal(x) ; d2 = dumps(dx) ; d2 ; len(d2)
0.0
b'\xfb\x00\x00\x00\x00\x00\x00\x00\x00'
9
b'\xc4\x82\x00\x00'
4

>>> x=1.0 ; x ; d1 = dumps(x) ; d1 ; len(d1) ; dx = Decimal(x) ; d2 = dumps(dx) ; d2 ; len(d2)
1.0
b'\xfb?\xf0\x00\x00\x00\x00\x00\x00'
9
b'\xc4\x82\x00\x01'
4

>>> x=1.5 ; x ; d1 = dumps(x) ; d1 ; len(d1) ; dx = Decimal(x) ; d2 = dumps(dx) ; d2 ; len(d2)
1.5
b'\xfb?\xf8\x00\x00\x00\x00\x00\x00'
9
b'\xc4\x82 \x0f'
4

>>> x=0.123456789 ; x ; d1 = dumps(x) ; d1 ; len(d1) ; dx = Decimal(x) ; d2 = dumps(dx) ; d2 ; len(d2)
0.123456789
b'\xfb?\xbf\x9a\xdd79c_'
9
b'\xc4\x8287\xc2W\x80\xe5\x18Js\xc0\xe4\x8f-\xf1\xc9\xf0\x90\xf4u%+\x93\xa7\n\x88\xa2?'
29
1

There are 1 best solutions below

0
On

So I found the answer is a combination of using the canonical=True argument to dumps() and casting the floats to lower precision floats (using numpy) where suitable (if any loss of precision is tolerable/acceptable).

NOTE: have to cast back to python float as cbor can't encode numpy classes at the momement.

>>> x=0.123456789 ; x ; d1=dumps(x, canonical=True) ; d1 ; len(d1)
0.123456789
b'\xfb?\xbf\x9a\xdd79c_'
9

>>> x=float( np.float32( 0.123456789 ) ) ; x ; d1=dumps(x, canonical=True) ; d1 ; len(d1)
0.12345679104328156
b'\xfa=\xfc\xd6\xea'
5

>>> x=float( np.float16( 0.123456789 ) ) ; x ; d1=dumps(x, canonical=True) ; d1 ; len(d1)
0.12347412109375
b'\xf9/\xe7'
3