I have generated a clusterization using K means in Python's Sci kit using the following code.
clustering=KMeans(n_clusters=n_clusterss, max_iter=300)
clustering.fit(data_state_normalized)
This has been straightforward and it works as intended.
As far as I know, Kmeans works by calculating the distances between the centroid of the cluster and its elements, and I assume that this distance is euclidean.
I am interested in the calculation of K means using other distances such as Manhattan. However, I cannot find how to do this in the documentation of K means. I know that this can be done in DBSCAN by defining an object as
class DBScan:
def __init__(self, distance='cityblock'):
self.distance = distance
def find_distance(self, x, type='cityblock'):
return distance.squareform(distance.pdist(x, type))
def fit_predict(self, data):
if self.distance == 'cityblock':
data_dist = self.find_distance(data, type='cityblock')
else:
data_dist = self.find_distance(data, type='euclidean')
return DBSCAN(eps=15, min_samples=2, metric='precomputed').fit_predict(data_dist)
and then
dbscan_instance = DBScan()
clusters = dbscan_instance.fit_predict(data_state_normalized)
Hierarchical clusering allows this as well as:
Clustering_Jerarquico=linkage(data_state_normalized, 'median', metric='cityblock')
dendrogram =sch.dendrogram(Clustering_Jerarquico)
clusters=fcluster(Clustering_Jerarquico, t=15, criterion="distance")
So I am convinced that it can be done as well for K means. Can someone please point me if I am right?