want help for running MapReduce programs on Google Cloud storage

298 Views Asked by At

I am using Google Cloud Storage for Hadoop 2.3.0 using GCS connector.

I have added GCS.jar to lib directory of my hadoop installation an added path to GCS connector in hadoop-env.sh file as:

export HADOOP_CLASSPATH=${HADOOP_CLASSPATH:"/share/hadoop/common/lib/gcs_connector"} 

I had also made changes to core-site.xml file of Hadoop installation as:

   <property>
   <name>fs.defaultFS</name>
   <value>hdfs://127.0.0.1:9000</value>
   </property>
   <property>
   <name>fs.gs.impl</name>
   <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
   <description>The FileSystem for gs: (GCS) uris.</description>
   </property>
   <property>
   <name>fs.AbstractFileSystem.gs.impl</name>
   <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
   <description>The AbstractFileSystem for gs: (GCS) uris. Only necessary for use with Hadoop 2.
  </description>
  </property>
  <property>
  <name>fs.gs.impl</name>
  <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
  </property>
  <property>
  <name>fs.gs.project.id</name>
  <value>1113</value>
  </property>
  <property>
  <name>fs.gs.system.bucket</name>
  <value>hadoop1</value>
  </property>
  <property>
  <name>fs.gs.working.dir</name>
  <value>/</value>
  </property>
  <property>
  <name>fs.gs.auth.service.account.enable</name>
  <value>true</value>
  </property>
  <property>
  <name>fs.gs.auth.service.account.email</name>
  <value>[email protected]</value>
  </property>
  <property>
  <name>fs.gs.auth.service.account.keyfile</name>
  <value>C://hadoop-2.3.0/etc/hadoop/gcskey.p12</value>
  </property>
  <property>
  <name>fs.gs.auth.client.id</name>
  <value>7168543aovnjqaf1e7sumil.apps.googleusercontent.com</value>
   </property>

The billing account for my created project is also enabled.

I created a bucket and the contents of the buckets are visible to me using:

hadoop fs -ls gs://hadoop1 

I had tried the Hadoop click-to-deploy option of master and worker nodes for Hadoop and VM instances are created.

I had installed gcloud for auth login. Git repositories are also created.

I had followed MapReduce article posted on Google, but its not helpful for complete guidance.

Question: i want to run MapReduce programs developed in Java using deployed Hadoop on cloud? What path do I provide in my programs for input and output files?

My programs are working well on Hadoop platform on my system.

0

There are 0 best solutions below