Cluster centres with wrong dimentions in skfuzzy C mean clustering

721 Views Asked by At

Hello I have written below simple code to explore the Fuzzy Cmean clustering

import pandas as pd
import numpy as np
from os import listdir
from sklearn.model_selection import train_test_split
from skfuzzy.cluster import cmeans, cmeans_predict
from sklearn.metrics import classification_report,confusion_matrix

def find_csv_filenames( path_to_dir, suffix=".csv" ):
    filenames = listdir(path_to_dir)
    return [ path_to_dir+filename for filename in filenames if filename.endswith( suffix ) ]

listFiles = find_csv_filenames('<Path to folder with csv files>')
for files in listFiles:
    df = pd.read_csv(files)
    df.loc[df['bug']>1,'bug']=1
    df2 =df.iloc[:,3:]
    #Above are some pre processing steps
    #Below splitting data for test and train
    X_train, X_test = train_test_split(df2, test_size=0.30)
    #dropping bug column for unsupervised learning
    X_train2 = X_train.drop('bug',axis=1) 
    X_test2  = X_test.drop('bug',axis=1) 
    print (X_train2.shape)
    #Shape is 163,20 for 163 training data with 20 features
    cntr, u, u0, d, jm, p, fpc = cmeans(X_train2,2,2,0.25,500,init=None, seed=None)
    print(cntr.shape)
    #above shape is coming 2,163

The Centre which is coming from the above cmeam algo is having size of (2,163) but since my training data is having only 20 feature, hence the shape of the cntr should have been (2,20). Unable to understand where i am wrong

1

There are 1 best solutions below

0
On BEST ANSWER

From the skfuzzy documentation:

data : 2d array, size (S, N)

Data to be clustered. N is the number of data sets; S is the number of features within each sample vector.

So you need to transpose your input, not tested but:

cmeans(X_train2.T, ...)

Should work.