Python DBSCAN - How to plot clusters based on mean of vectors?

1k Views Asked by At

Hi i have gotten the mean of the vectors and used DBSCAN to cluster them. However, i am unsure of how i should plot the results since my data does not have an [x,y,z...] format.

sample dataset:

mean_vec = [[2.2771908044815063],
 [3.0691280364990234],
 [2.7700443267822266],
 [2.6123080253601074],
 [2.6043469309806824],
 [2.6386525630950928],
 [2.7034034729003906],
 [2.3540258407592773]]

I have used this code below(from scikit-learn) to achieve my clusters:

X = StandardScaler().fit_transform(mean_vec)
db = DBSCAN(eps = 0.15, min_samples = 5).fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)

print('Estimated number of clusters: %d' % n_clusters_)

is it possible to plot out my clusters ? the plot from scikit-learn is not working for me. The scikit-learn link can be found here

1

There are 1 best solutions below

0
Has QUIT--Anony-Mousse On

On one dimensional data. Use kernel density estimation rather than DBSCAN. It is much better supported by theory and much better understood. One can see DBSCAN as a fast approximation to KDE for the multivariate case.

Any way, plotting 1 dimensional data is not that hard. For example, you can plot a histogram.

Also the clusters will necessarily correspond to intervals, so you can also plot lines for (min,max) of each cluster.

You can even abuse 2d scatter plots. Simply use the label as y value.