Standalone and open source library in java that allows document clustering similar to carrot2

413 Views Asked by At

I am looking to cluster short text documents, each a few hundred character long.

I have been using carrot2 workbench and I really like its capabilities but the API is really archaic and difficult to understand / use.

I am looking for a replacement that has similar capabilities (clustering algorithms) but with a better API.

I'm really looking for something in Java or Python and it has to be open source and free as in beer

So lingpipe (http://alias-i.com/lingpipe/) does not qualify.

Thanks.

1

There are 1 best solutions below

0
On

scikit-learn is in Python, supports a wide range of machine learning algorithms (including clustering) and is very well documented.