Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.CanSetDropBehind issue in ecllipse

7k Views Asked by At

I have the below spark word count program :

    package com.sample.spark;
    import java.util.Arrays;
    import java.util.List;
    import java.util.Map;
    import org.apache.spark.SparkConf;
    import org.apache.spark.api.java.*;
    import org.apache.spark.api.java.function.FlatMapFunction;
    import org.apache.spark.api.java.function.Function;
    import org.apache.spark.api.java.function.Function2;
    import org.apache.spark.api.java.function.PairFlatMapFunction;
    import org.apache.spark.api.java.function.PairFunction;
    import org.apache.hadoop.fs.FSDataInputStream;
    import org.apache.hadoop.fs.FSDataOutputStream;
    import scala.Tuple2;


    public class SparkWordCount {

        public static void main(String[] args) {
            SparkConf conf = new SparkConf().setAppName("wordcountspark").setMaster("local").setSparkHome("/Users/hadoop/spark-1.4.0-bin-hadoop1");
            JavaSparkContext sc = new JavaSparkContext(conf);
            //SparkConf conf = new SparkConf();
            //JavaSparkContext sc = new JavaSparkContext("hdfs", "Simple App","/Users/hadoop/spark-1.4.0-bin-hadoop1", new String[]{"target/simple-project-1.0.jar"});
            JavaRDD<String> textFile = sc.textFile("hdfs://localhost:54310/data/wordcount");
            JavaRDD<String> words = textFile.flatMap(new FlatMapFunction<String, String>() {
              public Iterable<String> call(String s) { return Arrays.asList(s.split(" ")); }
            });
            JavaPairRDD<String, Integer> pairs = words.mapToPair(new PairFunction<String, String, Integer>() {
                public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); }

            });

            JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new Function2<Integer, Integer, Integer>() {
                  public Integer call(Integer a, Integer b) { return a + b; }
                });  
            counts.saveAsTextFile("hdfs://localhost:54310/data/output/spark/outfile");

        }


    }

I get the Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.CanSetDropBehind exception when I run the code from ecllipse however if I export as runnable jar and run from the terminal as below it works :

      bin/spark-submit --class com.sample.spark.SparkWordCount --master local  /Users/hadoop/spark-1.4.0-bin-hadoop1/finalJars/SparkJar-v2.jar

The maven pom looks like :

    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
        <modelVersion>4.0.0</modelVersion>
        <groupId>com.sample.spark</groupId>
        <artifactId>SparkRags</artifactId>
        <packaging>jar</packaging>
        <version>1.0-SNAPSHOT</version>
        <name>SparkRags</name>
        <url>http://maven.apache.org</url>
        <dependencies>
            <dependency>
                <groupId>junit</groupId>
                <artifactId>junit</artifactId>
                <version>3.8.1</version>
                <scope>test</scope>
            </dependency>
            <dependency> <!-- Spark dependency -->
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.10</artifactId>
                <version>1.4.0</version>
                <scope>compile</scope>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <version>0.23.11</version>
                <scope>compile</scope>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-core</artifactId>
                <version>1.2.1</version>
                <scope>compile</scope>
            </dependency>
    </dependencies>
    </project>
2

There are 2 best solutions below

4
On BEST ANSWER

When you run in eclipse, the referenced jars are the only source for your program to run. So the jar hadoop-core(thats where CanSetDropBehind is present), is not added properly in your eclipse from local repository for some reasons. You need to identify this if it is a proxy issue, or any other with pom.

When you run the jar from terminal, the reason for running can be, due to the presence of jar in the classpath referenced. Also while running from terminal, you could also choose to have those jars as fat jar(to include hadoop-core) in your jar. I hope you are not using this option while creating your jar. Then the reference would be picked from inside your jar, without depending on the class path.

Verify each step, and it will help you identify the cause. Happy coding

0
On

Found that this was caused because the hadoop-common jar for the version 0.23.11 did not have the class,changed the version to 2.7.0 and also added below dependency :

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-mapreduce-client-core</artifactId>
        <version>2.7.0</version>
    </dependency>

Got rid of the error then but still seeing the below error :

Exception in thread "main" java.io.EOFException: End of File Exception between local host is: "mbr-xxxx.local/127.0.0.1"; destination host is: "localhost":54310; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException