imshow plotting very large integers, but "dtype object cannot be converted to float"

149 Views Asked by At

I have the following code, plotting a function on a grid, where the function happens to have a very large integer value:

import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter, FuncFormatter
import numpy as np # thanks to user @simon pointing out I had forgotten this

p = 13
counts = [[0 for x in range(p)] for y in range(p)]
counts[0][0] = 1000000000
unique_counts = np.unique(counts)
plt.imshow(counts, cmap='viridis', origin='lower', extent=[0, p-1, 0, p-1])
cbar = plt.colorbar(ticks=unique_counts, format=ScalarFormatter(useOffset=False))
cbar.ax.yaxis.set_major_formatter(FuncFormatter(lambda x, _: format(int(x), ',')))  # Format tick labels with commas
plt.show()

Running this in GoogleColab, it runs perfectly fine and gives the nice plot enter image description here

However, if I bump up counts[0][0] = 1000000000000000000000 say, then I get the following error:

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-12-0ec4c2551685> in <cell line: 8>()
      6 counts[0][0] = 100000000000000000000
      7 unique_counts = np.unique(counts)
----> 8 plt.imshow(counts, cmap='viridis', origin='lower', extent=[0, p-1, 0, p-1])
      9 cbar = plt.colorbar(ticks=unique_counts, format=ScalarFormatter(useOffset=False))
     10 cbar.ax.yaxis.set_major_formatter(FuncFormatter(lambda x, _: format(int(x), ',')))  # Format tick labels with commas

3 frames

/usr/local/lib/python3.10/dist-packages/matplotlib/image.py in set_data(self, A)
    699         if (self._A.dtype != np.uint8 and
    700                 not np.can_cast(self._A.dtype, float, "same_kind")):
--> 701             raise TypeError("Image data of dtype {} cannot be converted to "
    702                             "float".format(self._A.dtype))
    703 

TypeError: Image data of dtype object cannot be converted to float

I would like very much to be able to plot functions that take very large integer values with exact precision (so rounding/using floats would not be good). Is this possible?

EDIT: someone was understandably confused by this seemingly useless level of precision in a plot; I clarified that what's actually important for me is actually being able to read the exact value off the colorbar labels (for number theory applications, I need an exact count for the number of points on some varieties mod p). So I'm ok with the plot being slightly off, but I do really want the colorbar labels to be exact.

2

There are 2 best solutions below

3
simon On BEST ANSWER

New answer

(For my original answer, see the section below.)

Based on the question's update, from which it became clear that the essential information that should be retained is the precise integer values on the colorbar tick labels, here is my updated answer. Its crucial idea is:

  • For the colorbar tick positions and image data, use floating point values (as these are the only ones that Matplotlib can internally deal with; see original answer below).
  • For the colorbar tick labels, use the given integer values: provide them to Matplotlib as a list of already formatted strings (following this approach).

Here is the corresponding code:

import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter
import numpy as np

p = 13
counts = [[0 for x in range(p)] for y in range(p)]
# Provide some huge ints for demonstration purposes
counts[ 0][ 0] = 100000000000000000008
counts[ 0][-1] = counts[ 0][ 0] // 2
counts[-1][ 0] = counts[ 0][-1] // 2
counts[-1][-1] = counts[-1][ 0] // 2
# Get the unique values (without Numpy, just to be sure)
unique_counts = sorted(set(val for row in counts for val in row))
# Provide the image and tick *positions* as float values to avoid casting error
counts_img = np.array(counts, dtype=float)
counts_ticks = [float(val) for val in unique_counts]
# Provide the tick *labels* as strings generated from the original integer vals
counts_ticks_labels = [f'{val:,}' for val in unique_counts]
# Display everything
plt.imshow(counts_img, cmap='viridis', origin='lower', extent=[0, p-1, 0, p-1])
cbar = plt.colorbar(format=ScalarFormatter(useOffset=False))
cbar.set_ticks(ticks=counts_ticks, labels=counts_ticks_labels)
plt.show()

In older versions of Matplotlib, you might need to adjust the last three lines as follows:

cbar = plt.colorbar(ticks=counts_ticks, format=ScalarFormatter(useOffset=False))
cbar.ax.set_yticklabels(counts_ticks_labels)
plt.show()

And here is the resulting plot: plot resulting from provided code

Original answer

Short answer

I currently do not see a way to exactly pass huge integers to imshow(), due to the inner workings of Matplotlib relying on Numpy arrays for holding the image data. If you can live with approximate values, use

counts[0][0] = float(100000000000000000000)

Long answer

The reason for the error that you see is that your nested list of image data is internally converted to a Numpy array by Matplotlib before displaying it. In Matplotlib's current version, this happens in cbook.safe_masked_invalid(), which is called by ‎_ImageBase._normalize_image_array(), which is called by _ImageBase.set_data(), which is called by Axes.imshow().

The chain of problems here is the following:

  1. Huge integers (i.e. integers that cannot be represented by Numpy's int_ data type, I assume) are converted to Numpy's object data type by default. This happens for your data with counts[0][0] = 100000000000000000000, but not with counts[0][0] = 1000000000. You can easily check the corresponding Numpy behavior as follows:

    str(np.array([100000000000000000000]).dtype)
    # >>> 'object'
    str(np.array([1000000000]).dtype)
    # >>> 'int64'
    

    In Matplotlib, as already mentioned, this happens in cbook.safe_masked_invalid(); more precisely, it happens in the line x = np.array(x, subok=True, copy=copy), where x refers to your nested list counts.

  2. After that, _ImageBase._normalize_image_array() checks whether the resulting array's data type is either uint8 or whether it can be cast to the float data type. Neither is true for Numpy's object data type, so the error is raised.

To avoid this chain of problems, the only possibility that I see is converting your data to float values or to a float array yourself, once the values become too big, before passing them to imshow().

3
JD2911 On

Float has problems dealing with large integers. The following code worked for me, I hope this is the result that you want. It is always a bit better to have readable numbers in scientific notations than having extremely large numbers.

import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter
import numpy as np

p = 13
counts = [[0 for x in range(p)] for y in range(p)]
counts[0][0] = 100000000000000000000000

# Convert counts to a NumPy array
counts_array = np.array(counts, dtype=float)

# Create the plot
plt.imshow(counts_array, cmap='viridis', origin='lower', extent=[0, p-1, 0, p-1])
cbar = plt.colorbar()

cbar.set_ticks([np.min(counts_array), np.max(counts_array)])
cbar.ax.yaxis.set_major_formatter(ScalarFormatter(useOffset=False, useMathText=True))
cbar.update_ticks()

plt.show()

In the above code, even though imshow and the plot worked, the large numbers were still not shown in the colorbar. I have added here a new snippet, that would help print the numbers. I couldn't make matplotlib to print the numbers, so now I format the colorbar explicitly using strings.

import matplotlib.pyplot as plt
import numpy as np

p = 13
counts = [[0 for x in range(p)] for y in range(p)]
counts[0][0] = 1000000000000000000000

# Convert counts to a NumPy array
counts_array = np.array(counts, dtype=float)

# Create the plot
plt.imshow(counts_array, cmap='viridis', origin='lower', extent=[0, p-1, 0, p-1])
cbar = plt.colorbar()

# Set custom tick labels for the colorbar
tick_labels = [f'{int(x):,}' for x in cbar.get_ticks()]
cbar.set_ticklabels(tick_labels)

plt.show()