How to submit spark Job from locally and connect to Cassandra cluster

2.9k Views Asked by At

Can any one please let me know how to submit spark Job from locally and connect to Cassandra cluster.

Currently I am submitting the Spark job after I login to Cassandra node through putty and submit the below dse-spark-submit Job command.

Command: dse spark-submit --class ***** --total-executor-cores 6 --executor-memory 2G **/**/**.jar --config-file build/job.conf --args

With the above command, my spark Job able to connect to cluster and its executing, but sometimes facing issues.

So I want to submit spark job from my local machine. Can any one please guide me how to do this.

1

There are 1 best solutions below

1
On

There are several things you could mean by "run my job locally"

Here are a few of my interpretations

Run the Spark Driver on a Local Machine but access a remote Cluster's resources

I would not recommend this for a few reasons, the biggest being that all of your job management will still be handled between your remote machine and the executors in the cluster. This would be equivalent of having a Hadoop Job Tracker running in a different cluster than the rest of the Hadoop distribution.

To accomplish this though you need to run a spark submit with a specific master uri. Additionally you would need to specify a Cassandra node via spark.cassandra.connection.host

dse spark-submit --master spark://sparkmasterip:7077 --conf spark.cassandra.connection.host aCassandraNode --flags jar

It is important that you keep the jar LAST. All arguments after the jar are interpreted as arguments for the application and not spark-submit parameters.

Run Spark Submit on a local Machine but have the Driver run in the Cluster (Cluster Mode)

Cluster mode means your local machine sends the jar and environment string over to the Spark Master. The Spark Master then chooses a worker to actually run the driver and the driver is started as a separate JVM by the worker. This is triggered using the --deploy-mode cluster flag. In addition to specifying the Master and Cassandra Connection Host.

dse spark-submit --master spark://sparkmasterip:7077 --deploy-mode cluster --conf spark.cassandra.connection.host aCassandraNode --flags jar

Run the Spark Driver in Local Mode

Finally there exists a Local mode for Spark which starts the entire Spark Framework in a single JVM. This is mainly used for testing. Local mode is activated by passing `--master local``

For more information check out the Spark Documentation on submitting applications

http://spark.apache.org/docs/latest/submitting-applications.html