Hadoop distcp from a spring boot application - ClassNotFoundException

728 Views Asked by At

I am trying to submit distCP job from a spring boot application on a REST API call.

version of spring: 1.5.13.RELEASE hadoop version: 2.7.3

below is the code I am using to instantiate the DistCP:

List<Path> srcPathList = new ArrayList<Path>();
srcPathList.add(new Path("hdfs://<cluster>/tmp/<user>/source"));

Path targetPath = new Path("hdfs://<cluster>/tmp/<user>/destination");

DistCpOptions distCpOptions = new DistCpOptions(srcPathList,targetPath);
DistCp distCp = new DistCp(configuration,distCpOptions);
Job job = distCp.execute();

The job is submitted successfully to the cluster, however the job fails due to ClassNotFoundException on the cluster. Below is the exception:

INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; 
cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException:  
java.lang.RuntimeException: java.lang.ClassNotFoundException: 
Class org.apache.hadoop.tools.mapred.CopyOutputFormat not found

Why does this happen? Any pointers around this would be very helpful!! Thanks!

1

There are 1 best solutions below

0
On

I found the reason via viewing the job.jar on the NodeManager machine. The structure of job.jar is:

BOOT-INF/class/xxx

this is unreasonable.

I tried to replace the jar package with war,it works!

<packaging>war</packaging>

<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<!--exclude inner tomcat-->
    <exclusions>
        <exclusion>
            <artifactId>spring-boot-starter-tomcat</artifactId>
            <groupId>org.springframework.boot</groupId>
        </exclusion>
    </exclusions>
</dependency>
<!-- include tomcat-->
<dependency>
    <groupId>org.apache.tomcat</groupId>
    <artifactId>tomcat-servlet-api</artifactId>
    <version>7.0.47</version>
    <scope>provided</scope>
</dependency>
...

and then add start class:

import org.springframework.boot.builder.SpringApplicationBuilder;
import org.springframework.boot.web.support.SpringBootServletInitializer;

public class SpringBootStartApplication extends SpringBootServletInitializer {

    @Override
    protected SpringApplicationBuilder configure(SpringApplicationBuilder builder) {
        // 
        return builder.sources(xxxPortalApplication.class);
    }
}