I have run the ssvd by mahout to apply LSA (Latent semantic analysis). I have text documents each contains many features(from 100 to 2000 terms). I would like to use LSA on the documents to get the top terms or phrases which appear together "concepts". Any one has an idea how can I do that? Actually I applied preprocessing filtering(tokenization, stopword removal, stemming, ....), create tfidf by mahout, and then run ssvd command: bin/mahout ssvd -i termVectors/tfidf-vectors/part-r-00000 -no Output Folder -c 200 -us true -U false -V false -t 1 -ow -pca true I use clusterdump in mahout to parse the results, but all terms in the rsults start with the letter "a*", and are not represent any concept. Is anyone has experince in ssvd for reducing the features before clustering? or any idea how do you use ssvd to show the concepts in text corpus?
Thank you