I'm new in clustering with KMeans algorithm. I have a dataset with 17 variables. I need to apply Kmeans algorithm to two variables say df['feature1','feature2']. According to literature before clustering it need to standardize the dataset. I have standardized (df['feature1','feature2']) using standardScaller. The problem is when determining the number of K I checked for both dataset origin and standardize I get different values of K for both elbow and Silhouette. 1.Origin (before appying standardScaler) elbow K=2,Silhouette=2 2.standardized (after applying standardScaler) elbow K=3 Silhouette=4 My question is, which is prefered? (option 1 or option2). Similary the sns.distplot() for both option 1 and 2 still remains the same shape. In the origin dataset I have handled outliers using IQR with capping so not outliers, although origin dataset and standardized dataset are in different scales
Likewise when applying MinMaxScaller elbow says K=3 and Silhouette=5