Any Latent Semantic Indexing?

8.4k Views Asked by At

Is there any open source implementation of LSI in Java? I want to use that library for my project. I have seen jLSI but it implements some other model of LSI. I want a standard model.

6

There are 6 best solutions below

0
On BEST ANSWER

Have you considered LDA (Latent Dirichlet allocation)? I haven't really either, but I encountered the same problem with LSI recently (patents). From what I understand LDA is a related/more powerful technique. http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation apparently has some links to open-source implementations.

0
On

Have you tried the Semantic Vector package?

http://code.google.com/p/semanticvectors/

0
On

a google search for NLP tools provide this slides which i think helps ...

0
On

The S-Space Package has an open source version of LSA, with bindings for the LSI document vectors. (Both approaches operate on the same term-document matrix and are equivalent except in the output.) It's a fairly scalable approach that uses the thin-SVD. I've used it to run LSI on all of Wikipedia with no issue (after removing the infrequent terms with less than 5 occurrences).

As Scott Ray mentioned, the SemanticVectors package also has a good LSI implementation that recently switched to using the same thin-SVD (SVDLIBJ), so you might check that out as if you hadn't before.

0
On

I believe that LSA/LSI was patented in 1989, which means the patent should have just expired. Hopefully we will see some nice open source applications soon.

0
On

A google search for java LSI leads to a similar question that recommends SemanticVectors. A package built on top of Lucene that is 'similar' to LSI. I don't know if it's closer than the jLSI implementation.

That thread also mentions that LSI is patented and there aren't a lot of implementations of it. So if you need a standard implementation you may have to use a language other than java.