I am trying to do LDA Topic Analysis using SparkR. I am not sure what is the format of the input file.
I have a cleaned text file (I am working with the 20 Newsgroup) which I created in R. I save it as CSV, and then read it with read.df
to have a SparkDataFrame:
df <- read.df("text.example.csv", "csv", header=FALSE, inferSchema = "true")
However, when I run spark.lda
:
model <- spark.lda(df, k = 10, maxIter = 500, optimizer="online")
I get an error:
16/12/30 18:51:37 ERROR RBackendHandler: fit on org.apache.spark.ml.r.LDAWrapper failed
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
(and many more lines), which I guess is because of the input.
Does anyone know how to successfully run LDA in SparkR?