How to submit jobs to spark master running locally

1.1k Views Asked by At

I am using R and spark to run a simple example to test spark.

I have a spark master running locally using the following:

spark-class org.apache.spark.deploy.master.Master

I can see the status page at http://localhost:8080/

Code:

system("spark-submit --packages com.databricks:spark-csv_2.10:1.0.3 --master local[*]")

suppressPackageStartupMessages(library(SparkR)) # Load the library

sc <- sparkR.session(master = "local[*]")

df <- as.DataFrame(faithful)

head(df)

Now this runs fine when I do the following (code is saved as 'sparkcode'):

Rscript sparkcode.R 

Problem:

But what happens is that a new spark instance is created, I want the R to use the existing master instance (should see this as a completed job http://localhost:8080/#completed-app)

P.S: using Mac OSX , spark 2.1.0 and R 3.3.2

1

There are 1 best solutions below

0
On BEST ANSWER

A number of things:

  • If you use standalone cluster use correct url which should be sparkR.session(master = "spark://hostname:port"). Both hostname and port depend on the configuration but the standard port is 7077 and hostname should default to hostname. This is the main problem.
  • Avoid using spark-class directly. This is what $SPARK_HOME/sbin/ scripts are for (like start-master.sh). There are not crucial but handle small and tedious tasks for you.
  • Standalone master is only resource manager. You have to start worker nodes as well (start-slave*).
  • It is usually better to use bin/spark-submit though it shouldn't matter much here.
  • spark-csv is no longer necessary in Spark 2.x and even if it was Spark 2.1 uses Scala 2.11 by default. Not to mention 1.0.3 is extremely old (like Spark 1.3 or so).