I'm trying to run the simple example provided on the README of scala-xml
, but the code won't run:
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
.format("com.databricks.spark.xml")
.option("rowTag", "book")
.load("books.xml")
(copy-pasted from the README; books.xml
is indeed in the local directory)
This gives me error:
Name: Compile Error
Message: :1: error: illegal start of definition
.format("com.databricks.spark.xml") ^
StackTrace:
I'm running this from a Jupyter notebook with Spark/Scala kernel.
I'm sure there's a simple mistake, but I'm brand new to Scala/Spark.
Version info:
- Spark: 2.0.1
- Scala: 2.11.8
you can add packages to the Spark using the --packages command line option
As per the comment and question, you should try running the code in one line it will solve your problem of "error: illegal start of definition"
Next for "Failed to find data source: com.databricks.spark.xml. "
try adding the library dependency/package
"com.databricks:spark-xml_2.11:0.4.1 "