I am looking to cluster short text documents, each a few hundred character long.
I have been using carrot2 workbench and I really like its capabilities but the API is really archaic and difficult to understand / use.
I am looking for a replacement that has similar capabilities (clustering algorithms) but with a better API.
I'm really looking for something in Java or Python and it has to be open source and free as in beer
So lingpipe (http://alias-i.com/lingpipe/) does not qualify.
Thanks.
scikit-learn is in Python, supports a wide range of machine learning algorithms (including clustering) and is very well documented.