Density-connected sets in OPTICS algorithm

259 Views Asked by At

I am confused, about the OPTICS algorithm. A set of points can be considered as a cluster, if they are density-connected. A point p is density-connected to a point q if there is an object o such that both p and q are density-reachable from o wrt epsilon and MinPts.

optics_example

In my case (epsilon=5, minPts=2, L1-norm=Manhattan distance) H is a core point, since it is has more than 2 points in its epsilon distance. H is density-reachable from G and to G, because they share E. The same is true with H and S, because they share T. After all, E, T, S and G are within the epsilon range of H. In my opinion E, G, H, S, T are in the same cluster.

If I run it with sklearn.optics it gives me the result of the picture, where H is a noise point.

Why is E, G, H, S and T not in the same cluster?

from sklearn.cluster import OPTICS
import numpy as np

data = np.array([[3,2], [2,5], [2,7], [1,8], [2,9], [2,8], [3,9], [7,9], [6,2], [7,1], [7,3], [7,2], [8,3], [9,2], [8,2], [8,1], [10,10], [10,11], [11,10], [11,11] ])

clustering = OPTICS(min_samples=2, max_eps=5.0, metric='manhattan').fit(data)
print('labels:', clustering.labels_)

which gives me:

labels: [ 0  1  1  1  1  1  1 -1  0  0  0  0  0  0  0  0  2  2  2  2]
          A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T
1

There are 1 best solutions below

0
boraas On BEST ANSWER

The final clustering result is based on the xi-step method (Figure 19 in the OPTICS paper) and not on the definition of reachability, which is actually the definition of the final clustering in DBSCAN.

In the xi-step method the algorithm detects the valleys or bumps from the reachability plot as clusters. In the reachability plot

enter image description here

the reachability distance between H and S is relatively high, this is why H is called an outlier.