spark-submit error loading class with fatjar on macOS

90 Views Asked by At

I am trying to run a simple hello world spark application

This is my code

package com.sd.proj.executables

import org.apache.spark.sql.functions.lit
import org.apache.spark.sql.{DataFrame, SparkSession}

class SparkConn {
  def getSparkConn(caller:String) : SparkSession = {
    val conf = new SparkConf().setAppName(caller)
    val spark: SparkSession = SparkSession.builder.config(conf).getOrCreate()
    spark
  }
}

object HelloSpark {
  def sparkDF()(implicit spark:SparkSession):DataFrame = {
    spark.emptyDataFrame
      .withColumn("Title",lit("Hello World!!!"))
  }

  def main(args:Array[String]):Unit ={
    val sparkConn = new SparkConn()
    implicit val spark = sparkConn.getSparkConn(this.getClass.getName)

    val df = sparkDF()
    df.show(false)

    spark.stop()
  }
}

This is my build.gradle

plugins {
    id 'scala'
    id 'idea'
    id 'org.scoverage' version '7.0.0'
}

repositories {
    mavenCentral()
}

sourceSets{
    main{
        scala.srcDirs = ['src/main/scala']
        resources.srcDirs = ['src/main/resources']
    }
    test{
        scala.srcDirs = ['src/test/scala']
        resources.srcDirs = ['src/test/resources']
    }
}

dependencies {
    //scala
    implementation 'org.scala-lang:scala-library:2.12.15'
    implementation 'org.scala-lang:scala-reflect:2.12.15'
    implementation 'org.scala-lang:scala-compiler:2.12.15'
    //spark
    implementation 'org.apache.spark:spark-core_2.12:3.2.0'
    implementation 'org.apache.spark:spark-sql_2.12:3.2.0'
    //junit
    testImplementation 'junit:junit:4.12'
    testImplementation 'org.scalatestplus:scalatestplus-junit_2.12:1.0.0-M2'
}

scoverage{
    scoverageVersion = "1.4.11"
    minimumRate=0.01
}

task fatJar(type:Jar){
    zip64 true
    manifest {
        attributes 'Implementation-Title': 'Gradle Fat Jar',
                'Implementation-Version': '0.1',
                'Main-Class': 'com.sd.proj.executables.HelloSpark'
    }
    duplicatesStrategy = DuplicatesStrategy.EXCLUDE
    baseName = project.name + '-fat'
    from {
        configurations.runtimeClasspath.collect {
            it.isDirectory() ? it : zipTree(it)
        }
    }
    with jar
}

and this is the project structure

.
├── README.md
├── build
│   ├── classes
│   │   └── scala
│   │       └── main
│   │           └── com
│   │               └── sd
│   │                   └── proj
│   │                       └── executables
│   │                           ├── HelloSpark$.class
│   │                           └── HelloSpark.class
│   ├── generated
│   │   └── sources
│   │       └── annotationProcessor
│   │           └── scala
│   │               └── main
│   ├── libs
│   │   ├── HelloSpark-fat.jar
│   │   └── HelloSpark.jar
│   └── tmp
│       ├── compileScala
│       ├── fatJar
│       │   └── MANIFEST.MF
│       ├── jar
│       │   └── MANIFEST.MF
│       └── scala
│           ├── classfileBackup
│           └── compilerAnalysis
│               ├── compileScala.analysis
│               └── compileScala.mapping
├── build.gradle
├── gradle
│   └── wrapper
│       ├── gradle-wrapper.jar
│       └── gradle-wrapper.properties
├── gradlew
├── gradlew.bat
├── settings.gradle
├── spark_submit.sh
└── src
    ├── main
    │   ├── resources
    │   └── scala
    │       └── com
    │           └── sd
    │               └── proj
    │                   └── executables
    │                       └── HelloSpark.scala
    └── test
        ├── resources
        └── scala

my spark-submit script is

#!/bin/bash

echo "Running spark-submit..."
SPARK_HOME=/opt/homebrew/Cellar/apache-spark/3.2.1
export PATH="$SPARK_HOME/bin/:$PATH"

JARFILE=`pwd`/build/libs/HelloSpark-fat.jar

# Run it locally
echo "cmd : ${SPARK_HOME}/bin/spark-submit --class \"com.sd.proj.executables.HelloSpark\" --master local $JARFILE"
${SPARK_HOME}/bin/spark-submit --class "com.sd.proj.executables.HelloSpark" --master local $JARFILE

both scala and spark are installed on my mac

% type spark-submit
spark-submit is /opt/homebrew/bin/spark-submit
% type scala
scala is /opt/homebrew/opt/[email protected]/bin/scala

When I run above spark-submit it fails saying **Error: Failed to load class com.sd.proj.executables.HelloSpark. **

% bash spark_submit.sh                                  
Running spark-submit...
cmd : /opt/homebrew/Cellar/apache-spark/3.2.1/bin/spark-submit --class "com.sd.proj.executables.HelloSpark" --master local /Users/dsam05/IdeaProjects/HelloSpark/build/libs/HelloSpark-fat.jar
22/11/12 14:35:14 WARN Utils: Your hostname, Soumyajits-MacBook-Air.local resolves to a loopback address: 127.0.0.1; using 192.168.2.21 instead (on interface en0)
22/11/12 14:35:14 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/homebrew/Cellar/apache-spark/3.2.1/libexec/jars/spark-unsafe_2.12-3.2.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Error: Failed to load class com.sd.proj.executables.HelloSpark.
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

I have never run spark on a mac before, can someone please guilde what am I doing incorrectly here this is on a M1 mac, macOS 13

1

There are 1 best solutions below

0
On

Solved this problem, posting as it might help someone else

Removed task fatJar from my build.gradle then added this config

plugins {
    //added this entry on top of what is already present
    id 'com.github.johnrengelman.shadow' version '7.1.2'
}
//...
//same config as in question above
//...
shadowJar{
    zip64 true
}

now the jar is created as HelloSpark-all.jar but runs perfectly

22/11/19 23:25:12 INFO CodeGenerator: Code generated in 88.62025 ms
22/11/19 23:25:12 INFO CodeGenerator: Code generated in 8.248958 ms
+--------------+
|Title         |
+--------------+
|Hello World!!!|
+--------------+

22/11/19 23:25:12 INFO SparkUI: Stopped Spark web UI at http://localhost:4040