Input for spark.lda

141 Views Asked by At

I am trying to do LDA Topic Analysis using SparkR. I am not sure what is the format of the input file.

I have a cleaned text file (I am working with the 20 Newsgroup) which I created in R. I save it as CSV, and then read it with read.df to have a SparkDataFrame:

df <- read.df("text.example.csv", "csv", header=FALSE, inferSchema = "true")

However, when I run spark.lda:

model <- spark.lda(df, k = 10, maxIter = 500, optimizer="online")

I get an error:

16/12/30 18:51:37 ERROR RBackendHandler: fit on org.apache.spark.ml.r.LDAWrapper failed
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

(and many more lines), which I guess is because of the input.

Does anyone know how to successfully run LDA in SparkR?

0

There are 0 best solutions below