Achieving concurrency through FAIR scheduling in Spark

418 Views Asked by At

My Environment: I'm trying to connect Cassandra through Spark Thrift server. Then I create a Meta-Table in Hive Metastore which holds the Cassandra table data. In a web application I connect to Meta-table through JDBC driver. I have enabled fair scheduling for Spark Thrift Server.

Issue: When I perform a load test for concurrency through JMeter for 100 users for 300 seconds duration, I get sub seconds response time for initial requests(Say like first 30 seconds). Then the response time gradually increases (like 2 to 3 seconds). When I check the Spark UI, all the jobs are executed less than 100 milliseconds. I also notice that jobs and tasks are in pending stage when request are received. So I assume that even though the tasks take sub seconds to execute they are submitted with a latency by the scheduler. How to fix this latency in job submission?

Following are my configuration details, Number of Workers - 2 Number of Executors per Worker - 1 Number of cores per Executor - 14 Total core of workers - 30 Memory per Executor - 20Gb Total Memory of worker - 106Gb

Configuration in Fair Schedule XML

<pool name="default">
    <schedulingMode>FAIR</schedulingMode>
    <weight>2</weight>
    <minShare>15</minShare>
  </pool>
  <pool name="test">
    <schedulingMode>FIFO</schedulingMode>
    <weight>2</weight>
    <minShare>3</minShare>
  </pool>

I'm executing in Spark Standalone mode.

1

There are 1 best solutions below

0
On

Is it not the case queries pending in the queue while others are running. Try reducing spark.locality.wait to say 1s