I started to experiment using the scipy.spatial.ConvexHull
function, that (if I understood correctly) is a wrapper for the qhull
C library. I am using SciPy 0.19.1 with Python3.
I first worked with a real-world dataset that features 700 points in 21 dimensions, and scipy.spatial.ConvexHull
crashes, with this error: scipy.spatial.qhull.QhullError: QH6235 qhull error (qh_memalloc): negative request size (-2003053336). Did int overflow due to high-D?
.
After a few attempts using the following sample Python3 code:
import numpy as np
X = np.random.randn(40,21)
print("Computing convex hull of X (shape: " + str(X.shape) + ")...")
from scipy.spatial import ConvexHull
hull = ConvexHull(X)
I managed to narrow the issue down to dimensionality. With 39 randomly generated points in 21 dimensions, it works. With 40 points, sometimes it crashes, sometimes it succeeds. I am not sure, but it seems there's a memory allocation error?
- Is there a way to avoid the memory problem? Are 700 points too many for convex hull algorithms?
- Skimming through search results in Google, I noticed that there are some algorithms to compute an approximation of a convex hull. Do you recommend them? Could they work in my situation? Is there already a Python implementation for some of them?
- Potentially I would like to compute convex hulls of high-dimensional spaces, up to 100,000 dimensions. Is this madness, or can it be done in a sensible way?
TLDR: ConvexHull with struggle in high dimensions
ConvexHull uses qhull. I've played with an equivalent implementation in matlab (also used qhull) with a problem in high dimensions owing to the large amount of memory required. I contacted the developers who confirmed this to be an issue.
For help in monitoring Qhull's memory performance see 'Performance of Qhull" http://www.qhull.org/html/qh-code.htm#performance
"When to use Qhull" may also be of help http://www.qhull.org/html/index.htm#when