explanation of sklearn optics plot

804 Views Asked by At

I am currently learning how to use OPTICS in sklearn. I am inputting a numpy array of (205,22). I am able to get plots out of it, but I do not understand how I am getting a 2d plot out of multiple dimensions and how I am supposed to read it. I more or less understand the reachability plot, but the rest of it makes no sense to me. Can someone please explain what is happening. Is the function just simplifying the data to two dimensions somehow? Thank you enter image description here

1

There are 1 best solutions below

1
On

From the sklearn user guide:

The reachability distances generated by OPTICS allow for variable density extraction of clusters within a single data set. As shown in the above plot, combining reachability distances and data set ordering_ produces a reachability plot, where point density is represented on the Y-axis, and points are ordered such that nearby points are adjacent. ‘Cutting’ the reachability plot at a single value produces DBSCAN like results; all points above the ‘cut’ are classified as noise, and each time that there is a break when reading from left to right signifies a new cluster.

the other three plots are a visual representation of the actual clusters found by three different algorithms.

as you can see in the OPTICS Clustering plot there are two high density clusters (blue and cyan) the gray crosses acording to the reachability plot are classify as noise because of the low xi value

in the DBSCAN clustering with eps = 0.5 everithing is considered noise since the epsilon value is to low and the algorithm can not found any density points.

Now it is obvious that in the third plot the algorithm found just a single cluster because of the adjustment of the epsilon value and everything above the 2.0 line is considered noise.

please refer to the user guide: