fuzzy-c-means - setting initial number of clusters=6, but only 4 cluster labels generated

3.1k Views Asked by At

I use the fuzzy-c-means clustering implementation and I would like the data X to form the number of clusters i define in the algorithm(I beleive that is how it works). But the behavior is confusing.

cm = FCM(n_clusters=6)
cm.fit(X)

This code generates a plot with 4 labels - [0,2,4,6]

cm = FCM(n_clusters=4)
cm.fit(X)

This code generates a plot with 4 labels - [0,1,2,3]

I expect labels [0,1,2,3,4,5] when i initialize the cluster number to be 6.

code:

from fcmeans import FCM
from matplotlib import pyplot as plt
from seaborn import scatterplot as scatter

# fit the fuzzy-c-means
fcm = FCM(n_clusters=6)
fcm.fit(X)

# outputs
fcm_centers = fcm.centers
fcm_labels  = fcm.u.argmax(axis=1)

# plot result
%matplotlib inline
f, axes = plt.subplots(1, 2, figsize=(11,5))
scatter(X[:,0], X[:,1], ax=axes[0])
scatter(X[:,0], X[:,1], ax=axes[1], hue=fcm_labels)
scatter(fcm_centers[:,0], fcm_centers[:,1], ax=axes[1],marker="s",s=200)
plt.show()
3

There are 3 best solutions below

0
On

I'm using fuzzy-c-means version 1.7.0:

>>> import fcmeans
>>> fcmeans.__version__
'1.7.0'

Using synthetic data:

>>> from sklearn.datasets import load_iris
>>> iris = load_iris().data
>>> model = fcmeans.FCM(n_clusters = 2)
>>> model.fit(iris)
>>> pred = model.predict(iris)
>>> from collections import Counter
>>> Counter(pred)
Counter({0: 97, 1: 53})

So, the n_clusters applied correctly.

1
On

I read about it and looks like once the algorithm reaches the knee point(max number of clusters it can perform with the data), it wont create anything more than this. So in my question, 4 was the maximum number of clusters that the algo perform with the given dataset.

0
On

Fuzzy c-means is a fuzzy clustering algorithm.

The labels are only an approximation to the fuzzy assignment.

Most likely two clusters are pretty weak, and hence never win the argmax operation used to produce the labels. That doesn't mean these clusters have not been used, you are just not using the full fuzzy result.