I am implementing a readability formula in Java based on this paper.
I reached the point where I have to compute the conceptual and the relational similarity of two or more words.
They say:
We use Latent Semantic Analysis (LSA) tools to compute word similarity. LSA can derive semantic information, including similarity, from a word-document co-occurrence matrix. Word/term co-occurrences are counted in a moving window of a fixed size that scans the entire corpus. The co-occurrence models using windowsizes of +-1 and +-4 considered as relational similarity and conceptual semantic models, respectively.
I tried to see some implementations of LSA, like this one, but couldn't find a straightforward way to get what I want.
I supposedly need to have a matrix based on the words, so I tried to use WS4J library to compute the matrix based on two arrays of Strings.
WS4J also has a method calcRelatednessOfWords()
but the results it gets don't match with the ones shown in the paper.
Is there any library that offers what I want? Or can anyone point me in the right direction?