In Apache Spark, where is "SPARK_HOME/launcher/target/scala-2.13" and how can I use it?

266 Views Asked by At

When I launch a Spark program in local-cluster mode, I got the following error:


17:45:33.930 [ExecutorRunner for app-20231004174533-0000/0] ERROR org.apache.spark.deploy.worker.ExecutorRunner - Error running executor
java.lang.IllegalStateException: Cannot find any build directories.
    at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:228) ~[spark-launcher_2.13-3.5.0.jar:3.5.0]
    at org.apache.spark.launcher.AbstractCommandBuilder.getScalaVersion(AbstractCommandBuilder.java:241) ~[spark-launcher_2.13-3.5.0.jar:3.5.0]
    at org.apache.spark.launcher.AbstractCommandBuilder.buildClassPath(AbstractCommandBuilder.java:195) ~[spark-launcher_2.13-3.5.0.jar:3.5.0]
    at org.apache.spark.launcher.AbstractCommandBuilder.buildJavaCommand(AbstractCommandBuilder.java:118) ~[spark-launcher_2.13-3.5.0.jar:3.5.0]
    at org.apache.spark.launcher.WorkerCommandBuilder.buildCommand(WorkerCommandBuilder.scala:39) ~[spark-core_2.13-3.5.0.jar:3.5.0]
    at org.apache.spark.launcher.WorkerCommandBuilder.buildCommand(WorkerCommandBuilder.scala:45) ~[spark-core_2.13-3.5.0.jar:3.5.0]
    at org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:63) ~[spark-core_2.13-3.5.0.jar:3.5.0]
    at org.apache.spark.deploy.worker.CommandUtils$.buildProcessBuilder(CommandUtils.scala:51) ~[spark-core_2.13-3.5.0.jar:3.5.0]
    at org.apache.spark.deploy.worker.ExecutorRunner.org$apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:160) [spark-core_2.13-3.5.0.jar:3.5.0]
    at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:80) [spark-core_2.13-3.5.0.jar:3.5.0]

Analyzing source code of Spark leads to the following code snippets that causes the error:

(the following are part of Spark 3.5.0 source code: AbstractCommandBuilder.scala, line 227)


  String getScalaVersion() {
    String scala = getenv("SPARK_SCALA_VERSION");
    if (scala != null) {
      return scala;
    }
    String sparkHome = getSparkHome();
    File scala213 = new File(sparkHome, "launcher/target/scala-2.13");
    checkState(scala213.isDirectory(), "Cannot find any build directories.");
    return "2.13";
    // ...
  }

The intention of this function is to ensure the existence of "SPARK_HOME/launcher/target/scala-2.13" to ensure that the deployed Spark is compiled using the same Scala version. Unfortunately, this directory only exists on Spark project, the binary version of Spark doesn't have it:

enter image description here

Should this function be improved to be compatible with both distributions?

UPDATE 1: Thanks a lot for Anish's suggestion that Spark distribution doesn't contain Scala binary. But in fact they do:

enter image description here

This could be a more reliable evidence to determine Scala version, but at this moment, it wasn't used.

2

There are 2 best solutions below

3
VonC On

The Spark code at org.apache.spark.launcher.AbstractCommandBuilder#getScalaVersion() commes from commit 2da6d1a and PR 43125, with PR SPARK-32434 before that.

That seems pretty much hard-coded, which means before launching your Spark application, you need to set the SPARK_SCALA_VERSION environment variable to the Scala version you are using. That should bypass the directory check that is failing in getScalaVersion().

9
Anish B. On

I'm not sure what you did but I didn't face any issue while running spark.

To be precise, you have to select Apache Spark Binary which contains scala libraries and that comes with spark-3.5.0-bin-hadoop3-scala2.13.tgz file.

Note: I don't have scala installed.

enter image description here

Step - by - Step process on how it was working on my local.

  1. Go to https://spark.apache.org/downloads.html :

    • Choose a Spark release: 3.5.0 (Sep 13 2023)
    • Choose a package type: Pre-built for Apache Hadoop 3.3 and later (Scala 2.13) from the dropdown.
  2. Click on Download Spark: spark-3.5.0-bin-hadoop3-scala2.13.tgz

  3. After downloading, extract it on your local and it will show contents like this given below

enter image description here

  1. Now, open the terminal from /bin folder.

  2. Execute the command ./spark-shell --master local to run cluster mode locally. It will work

Screenshot:

enter image description here

To verify whether or not it's running in local mode, enter sc.isLocal in scala terminal as shown in the screenshot.

Note: Spark 3.5.0 binary is pre-packaged with Scala libraries. So, it shouldn't throw that error.

Go the /jars directory and you have all of your scala libraries there.

enter image description here

That's all.