Spark Batch is pausing Spark Streaming job

769 Views Asked by SpooXter At 21 March 2016 at 04:46

I have a standalone Spark running on a virtual machine on my computer. Spark Streaming gets data from Kafka, saves it onto an HBase table, then processes it and saves the result to another table.

Spark Batch queries the table of processed results for the latest entry and uses data from there to determine which data to query from the unprocessed data table. The batch job has an infinite while loop making the batch restart once it finishes. Both it and the streaming job have the scheduler set to fair.

I have a client app that runs all these things in proper order by streaming generated information into Kafka first and then launching a separate thread for the streaming layer, after that for the batch after a certain delay.

My issue is that streaming runs and doesn't complain, using 2 of the 3 provided cores, but when the batch job starts, the stream says it's running, but the HBase tables show clearly that while the batch jobs are writing to their table, the streaming jobs don't write anything. Also, the streaming logs pause while this all happens.

The way I set up the threads to be run is like this:

    Runnable batch = new Runnable() {

        @Override
        public void run() {
            try {
                Lambda.startBatch(lowBoundary, highBoundary);
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    };

    Thread batchThread = new Thread(batch);
    batchThread.start();

The starting of batch and streaming are done through ProcessBuilder like this:

public static void startBatch(String low, String high) throws Exception {
    // Specify executable path
    String sparkSubmit = "/home/lambda/Spark/bin/spark-submit";

    // Describe the process to be run
    ProcessBuilder batch = new ProcessBuilder(sparkSubmit,
            "--class", "batch.Batch", "--master",
            "spark://dissertation:7077",
            "/home/lambda/Downloads/Lambda/target/lambda-1.0-jar-with-dependencies.jar",
            low, high);

    // Start the batch layer
    batch.start();

}

Does anyone have an idea on why that's happening? I'm suspecting it's just Spark not managing the tasks like I want them to, but have no idea what to do about it.

Original Q&A

Spark Batch is pausing Spark Streaming job

There are 0 best solutions below

Related Questions in JAVA

Related Questions in APACHE-SPARK

Related Questions in HBASE

Related Questions in SPARK-STREAMING

Related Questions in LAMBDA-ARCHITECTURE

Trending Questions

Popular # Hahtags

Popular Questions