I am able to run wordcount on alluxio with an example jar provided by cloudera, using:
sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar wordcount -libjars /home/nn1/alluxio-1.2.0/core/client/target/alluxio-core-client-1.2.0-jar-with-dependencies.jar alluxio://nn1:19998/wordcount alluxio://nn1:19998/wc1
and it's a success.
But I can't run it when I use the jar created with the ATTACHED CODE, This is also a sample wordcount example code
sudo -u hdfs hadoop jar /home/nn1/HadoopWordCount-0.0.1-SNAPSHOT-jar-with-dependencies.jar edu.am.bigdata.C45TreeModel.C45DecisionDriver -libjars /home/nn1/alluxio-1.2.0/core/client/target/alluxio-core-client-1.2.0-jar-with-dependencies.jar alluxio://10.30.60.45:19998/abdf alluxio://10.30.60.45:19998/outabdf
Above code is build using maven Pom.xml file contains
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>2.6.0-mr1-cdh5.4.5</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0-cdh5.4.5</version>
</dependency>
Could you please help me in running my wordcount program in alluxio cluster. Hope no extra configurations are added into pom file for running the same.
I am getting the following error after running my jar :
java.lang.IllegalArgumentException: Wrong FS: alluxio://10.30.60.45:19998/outabdf, expected: hdfs://10.30.60.45:8020 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:657) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106) at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1215) at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1211) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1211) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1412) at edu.WordCount.run(WordCount.java:47) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at edu.WordCount.main(WordCount.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
The problem comes from the call to
FileSystem fs = FileSystem.get(conf);
on line 101. The
FileSystem
created byFileSystem.get(conf)
will only support paths with the scheme defined by Hadoop'sfs.defaultFS
property. To fix the error, change that line toFileSystem fs = FileSystem.get(URI.create("alluxio://nn1:19998/", conf)
By passing a
URI
, you overridefs.defaultFS
, enabling the createdFileSystem
to support paths using thealluxio://
scheme.You could also fix the error by modifying
fs.defaultFS
in yourcore-site.xml
However, this could impact other systems that share the
core-site.xml
file, so I recommend the first approach of passing analluxio://
URI toFileSystem.get()