I had one scenario where i had to cluster Operating System data. Actual Data consists of 151 users using Windows,27 users using MAC,5 users using Linux.
Once after clustering with Carrot2 API using Lingo3gClusteringAlgorithm. Getting cluster results as MAC OS users 27 ,Linux users 5 and finally all Windows users are in Other Topics Cluster. But it would be good if i get Windows users as a separate Cluster. So in order to get Windows as a separate cluster what clustering attributes do i need to configure. Currently using only "combined-cluster-score-balance" with value:1.0. Any help is appreciated
Both Carrot2 and Lingo3G are natural text clustering engines. You'll need at least a dozen of documents consisting of at least a paragraph of text to get sensible results.
Looking at your data, the text fields contain one word, which far too little for our algorithms to succeed. For your specific data you many need some of the generic clustering algorithms suitable for numeric and nominal data. Mahout and WEKA might be a good start.