Can I choose the distance in Scikit K means clusterization?

29 Views Asked by At

I have generated a clusterization using K means in Python's Sci kit using the following code.

clustering=KMeans(n_clusters=n_clusterss, max_iter=300)
clustering.fit(data_state_normalized)

This has been straightforward and it works as intended.

As far as I know, Kmeans works by calculating the distances between the centroid of the cluster and its elements, and I assume that this distance is euclidean.

I am interested in the calculation of K means using other distances such as Manhattan. However, I cannot find how to do this in the documentation of K means. I know that this can be done in DBSCAN by defining an object as

class DBScan:

    def __init__(self, distance='cityblock'):
        self.distance = distance

    def find_distance(self, x, type='cityblock'):
        return distance.squareform(distance.pdist(x, type))

    def fit_predict(self, data):
        if self.distance == 'cityblock':
            data_dist = self.find_distance(data, type='cityblock')
        else:
            data_dist = self.find_distance(data, type='euclidean')
        return DBSCAN(eps=15, min_samples=2, metric='precomputed').fit_predict(data_dist)

and then

dbscan_instance = DBScan()
clusters = dbscan_instance.fit_predict(data_state_normalized)

Hierarchical clusering allows this as well as:

  Clustering_Jerarquico=linkage(data_state_normalized, 'median', metric='cityblock')
    dendrogram =sch.dendrogram(Clustering_Jerarquico)
clusters=fcluster(Clustering_Jerarquico, t=15, criterion="distance") 

So I am convinced that it can be done as well for K means. Can someone please point me if I am right?

0

There are 0 best solutions below