I am trying to submit distCP job from a spring boot application on a REST API call.
version of spring: 1.5.13.RELEASE hadoop version: 2.7.3
below is the code I am using to instantiate the DistCP:
List<Path> srcPathList = new ArrayList<Path>();
srcPathList.add(new Path("hdfs://<cluster>/tmp/<user>/source"));
Path targetPath = new Path("hdfs://<cluster>/tmp/<user>/destination");
DistCpOptions distCpOptions = new DistCpOptions(srcPathList,targetPath);
DistCp distCp = new DistCp(configuration,distCpOptions);
Job job = distCp.execute();
The job is submitted successfully to the cluster, however the job fails due to ClassNotFoundException on the cluster. Below is the exception:
INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED;
cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class org.apache.hadoop.tools.mapred.CopyOutputFormat not found
Why does this happen? Any pointers around this would be very helpful!! Thanks!
I found the reason via viewing the
job.jar
on the NodeManager machine. The structure of job.jar is:this is unreasonable.
I tried to replace the jar package with war,it works!
and then add start class: