tick label appears with different precision due to IEEE 754 float precision

73 Views Asked by At

For example:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0.0,1.2,0.2)
y = np.arange(0.0,1.2,0.2)

labels = np.arange(0.0,1.2,0.2)

plt.plot(x, y)
plt.xticks(x, labels)
plt.show()

enter image description here

I had to use np.around(np.arange(0.0, 1.2, 0.2),1) to avoid it, but if I just run np.arange(0.0,1.2,0.2) it gives: array([0. , 0.2, 0.4, 0.6, 0.8, 1. ]), why is it different?

Also, the y axis do not use 0.60...01 as label, which is also weird.

This issue is due to IEEE 754 float precision, and I think it should have a good solution to round decimal numbers.

2

There are 2 best solutions below

1
On BEST ANSWER

The same floating point representation that represents 0.6, matches a whole interval of real numbers. So all real numbers from 0.59999999999999993 to 0.60000000000000003 share the same float64 representation.

Just try it:

import struct
struct.pack('d', 0.59999999999999992)
struct.pack('d', 0.59999999999999993)
struct.pack('d', 0.59999999999999994)
struct.pack('d', 0.59999999999999995)
struct.pack('d', 0.59999999999999996)
struct.pack('d', 0.59999999999999997)
struct.pack('d', 0.59999999999999998)
struct.pack('d', 0.59999999999999999)
struct.pack('d', 0.60000000000000000)
struct.pack('d', 0.60000000000000001)
struct.pack('d', 0.60000000000000002)
struct.pack('d', 0.60000000000000003)
struct.pack('d', 0.60000000000000004)

As you can see, all, but the first and last number, have the same representation.

But that is not the only problem. Because, that float64 object that represents any real between 0.59993 and 0.6003, is represented by python with the "roundest" number of that interval. Namely, 0.6. This is why when you type 0.6 in your python interpreter, it doesn't reply 0.59999999999999993 nor 0.59999999999999999. (Or, that would have been an easiest way to test that struct — but I wanted to introduce struct —, why when you type 0.59999999999999994, python replies 0.6, but when you type 0.59999999999999992, it says 0.5999999999999999)

The problem is that 0.2 neither have an exact representation.

All real numbers from 0.19999999999999998 0.20000000000000002 share the same representation. And that representation is only the exact representation of 0.20000000000000001110223024625156540423631668090820...

I know this because:

import struct
b=struct.pack('d', 0.2)
x=struct.unpack('l', b)[0]
exponent=(x>>52)&(2**11-1) # 1020 aka -3
mantissa=x&(2**52-1) # 2702159776422298
mantissa+=2**52 # Add the implicit 1 of float64
# Check, mantissa/2**52*2**-3 should be ~0.2
mantissa/2**52*2**(exponent-1023)  # 0.2
# To know the rest of the digits that python float64 can't show, 
# I take advantage of the infinite range of integers of python, and compute
# that times 10**50
# using exact integer operations
10**50*mantissa//(2**(52+1023-exponent))
# 20000000000000001110223024625156540423631668090820

)

Now, I you multiply that number by 3, you get 0.60000000000000003330669073875469621270895004272460...

Which is greater than 0.60000000000000003

In other words, 0.2*3 and 0.6 doesn't have the same float64 representation.

Now, when a numpy array is printed, it is a bit rounded.

np.array([1.234567890123])

array([1.23456789])

This is just a display choice of numpy (which can be tweaked, btw, with set_printoptions). The way __repr__ method works.

You can check that

np.array([1.234567890123])[0]

1.234567890123

Which is why you didn't see the numerical error when printing the range.

All digits are there. They are just not printed by numpy array's __repr__.

Same goes for 0.6

np.arange(0,1.2,0.2)
#array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])
np.arange(0,1.2,0.2)[3]
#0.6000000000000001

As for how to avoid it:

  • You can do what you did. Round the numbers a bit. Which would turn that into a 0.6 (again, not an exact one. But at least a number whose float64 representation is the same as 0.6, and therefore would be printed as "0.6")
  • do nothing. And remove your xticks specification. The behaviour you expect is already the default one
  • If the problem is that, for some reason, the default behaviour is different on your machine (different resolution of something), and not all ticks are printed, use xticks to impose the ticks position, but do not set label, and let the default formater choose how they are printed (so choose which ticks are printed, not how) plt.xticks(x)
  • If on the contrary, it is not which ticks are printed, but how they are printed that bothers you with the default behavior, you can set the formatter that you like
    import matplotlib.ticker as tk plt.gca().xaxis.set_major_formatter(tk.FormatStrFormatter('%.2f'))
    I chose voluntarily to use 2 decimals to see the difference with the default.
  • Of course, you can do both: choose with 1-argument xticks where to print labels, and with formatter how to print them.
  • Lastly, as already suggested while I was typing this answer, if you need to use 2-arguments xticks to fix both ticks and their label (but I think that should be avoided, because that is redoing formatter job. I do that only when I need some exotic labels. Such as xticks(x, ['zero', '1/5', '40%', '3/5', '80%', 'full'])), then pass explicit strings as label (what would be the point of redoing formatter's job, if it is to still not choose yourself how to print the non-string object you passed?)
    plt.xticks(x, [f'{t:.2f}' for t in x])
0
On

The addition of extra trailing zeros has to do to floating point operations that result in certain decimals not able to be represented exactly. Read this answer along with its references.

For your problem in hand, you can reformat the labels like this

plt.xticks(x, [f"{l:.1f}" for l in labels])

where the .1 denotes one significant decimal digit.