Apache gora, where to set new table name in reducer

54 Views Asked by At

I have an application that is basically an Hbase Mapreduce job with Apache Gora. I am very simple case that I want to copy one Hbase table data to a new table. Where to write new table name. I have reviewed this Guide but could not find where to put new table name. Following is the code snippet,

/* Mappers are initialized with GoraMapper.initMapper() or 
   * GoraInputFormat.setInput()*/
  GoraMapper.initMapperJob(job, inStore, TextLong.class, LongWritable.class,
      LogAnalyticsMapper.class, true);

  /* Reducers are initialized with GoraReducer#initReducer().
   * If the output is not to be persisted via Gora, any reducer 
   * can be used instead. */
  GoraReducer.initReducerJob(job, outStore, LogAnalyticsReducer.class);

Simple MR job is very easy for this case.

1

There are 1 best solutions below

3
Alfonso Nishikawa On BEST ANSWER

I will redirect you to the tutorial, but I will try to clarify here :)

The table name is defined in you mappings. Check Table Mappings. Maybe you have a file called gora-hbase-mapping.xml where the mapping is defined. There should be something like this:

<table name="Nameofatable">
...
<class name="blah.blah.EntityA" keyClass="java.lang.Long" table="Nameofatable">

There you configure the table name (put the same name if you find both). There can be several <table> and <class>. Maybe one for your input and one for your output.

AFTER that, you have to instantiate your input/output datastores inStore and outStore. The tutorial got a bit messy and the creation of inStore and outStore got to the wrong section. You just do something like:

inStore = DataStoreFactory.getDataStore(String.class, EntityA.class, hadoopConf);
outStore = DataStoreFactory.getDataStore(Long.class, OtherEntity.class, hadoopConf);

Explanation "in the other way":

  • You instantiate the datastore with DataStoreFactory.getDatastore(key class, entity class, conf).
  • The entity class requested is looked into gora-hbase-mapping.xml for <class name="blah.blah.EntityA".
  • In that <class> it is the attribute table=. That is your table name :)

So: you define an entity as input with its table name, and you define an entity as ouput with its table name


EDIT 1:

If the entity class is the same, but the table names are different, the only solution I can think of is creating two classes Entity1 and Entity2 with the same schema and in your gora-hbase-mapping.xml create two <table> and <class>. Then instantiante the stores like:

inStore = DataStoreFactory.getDataStore(String.class, Entity1.class, hadoopConf);
outStore = DataStoreFactory.getDataStore(String.class, Entity2.class, hadoopConf);

It is not very clean but it should work :\


EDIT 2 (not for this question):

If the source table and the destination table are the same, there is a version for initReducerJob that allows this behavior.An example is in Nutch's GeneratorJob.java:

StorageUtils.initMapperJob(currentJob, fields, SelectorEntry.class, WebPage.class, GeneratorMapper.class, SelectorEntryPartitioner.class, true);
StorageUtils.initReducerJob(currentJob, GeneratorReducer.class);