String clustering using matlab?

330 Views Asked by At

I have a cell array of ~200k entries containing relatively small strings (1-2 words). I'm trying to cluster them based on string similarity. I've tried using levenshtein distances to create a distance matrix (using a loop to compare each string to all other strings) to use hierarchical or kmeans clustering on it but am confused on how to use that once the distance matrix is formed (specifically in matlab). If anyone has any ideas or suggestions they would be greatly appreciated.

1

There are 1 best solutions below

0
On

k-means cannot operate on distance matrixes

It uses means, and squared deviation (=variance) from the mean only.

hierarchical clustering works fine on distance matrixes. See the documentation for how to pass a precomputed distance matrix.