Why are the K-means cluster labels correct but the centroids are not near the cluster centers?

35 Views Asked by At

I don't understand why the centroids are jammed into the lower left corner but there are three cluster labels in the graph.

print(df.info())
print(df)
preprocessor = ColumnTransformer(
    transformers=[
        ('cat', OneHotEncoder(), ['State'])
    ], remainder='passthrough'
)

kmeans = Pipeline([
    ('preprocessor', preprocessor),
    ('kmeans', KMeans(n_clusters = 3, random_state=0, n_init = "auto"))
]).fit(df)

labels = kmeans['kmeans'].labels_
print("Cluster Labels:", labels)

centroids = kmeans['kmeans'].cluster_centers_
print("Centroids:", centroids)

labels = kmeans['kmeans'].labels_
centroids = kmeans['kmeans'].cluster_centers_

plt.scatter(df['SumOfTotalPrice'], df['State'], c = labels)
plt.scatter(centroids[:, 0], centroids[:, 1], marker='*', s=200, c='#050505')
plt.xlabel('SumOfTotalPrice')
plt.ylabel('State')
plt.show()




<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11 entries, 0 to 10
Data columns (total 2 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   State            11 non-null     object 
 1   SumOfTotalPrice  11 non-null     float64
dtypes: float64(1), object(1)
memory usage: 304.0+ bytes
None
   State  SumOfTotalPrice
0     AK     1.063432e+07
1     CA     4.172891e+07
2     IL     2.103149e+07
3     IN     2.270681e+08
4     KY     4.144238e+07
5     ME     2.057557e+07
6     MI     4.216375e+07
7     OH     7.970354e+08
8     PA     2.158148e+07
9     SD     1.025623e+07
10    TX     2.061534e+07
Cluster Labels: [0 0 0 2 0 0 0 1 0 0 0]
Centroids: [[1.11111111e-01 1.11111111e-01 1.11111111e-01 0.00000000e+00
  1.11111111e-01 1.11111111e-01 1.11111111e-01 0.00000000e+00
  1.11111111e-01 1.11111111e-01 1.11111111e-01 2.55588301e+07]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 7.97035399e+08]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
  0.00000000e+00 0.00000000e+00 0.00000000e+00 2.27068150e+08]]

Graph generated by the code

0

There are 0 best solutions below