I am trying to run a SVD job in mahout. I have a matrix (say A) created (Document x term) of size 372053 x 21338 (21338 no of unique words say N, 372053 documents say M). So my matrix A is of size (M*N). I ran the svd using mahout and i got the cleaned eigen vectors (i gave the expected rank as 200 say R). Now i have a eigen vectors matrix created of size R*N.
Stating the SVD equation
A = U * S * V' (V' being transpose of V)
I need to convert the matrix A to the new space, to get the compressed vectors of the documents (I am trying to implement LSI)
What is the output i get from mahout SVD? (I would like to know in terms of the equation above) I read mailing list that we can get the eigen values from the NamedVectors in the generated eigen vectors matrix.
Please guide me on how to proceed from here to generate the document-term matrix A in the new space (of size M*R).
Any help is highly appreciated :)
A good starting point for LSI with Stochastic SVD on Mahout can be found here. The good part is that the paper describes also the folding in process and is explicit on the output format in terms of the svd equation.
The work is integrated in the latest version 0.8 and can be used with
SSVDCli
job or through mahout CLI withmahout ssvd <options>