Apache Gora Reducer for multi-table output with Hbase

48 Views Asked by At

I have small data in Hbase table crawled via Nutch. It us using Apache Gora as an ORM. I have found a lot of example (mapreduce) to process data in single table in Hbase. But my problem is that I have to copy data into multiple tables (in reducer). Without Gora, there exists some guide e.g., this question etc. But how to do it for my case.

1

There are 1 best solutions below

0
On

I never did what you ask, but you might glimpse the answer in the Gora Tutorial "Constructing the job" section. There, there is an example of reducer configuration that says:

/* Mappers are initialized with GoraMapper.initMapper() or 
 * GoraInputFormat.setInput()*/
GoraMapper.initMapperJob(job, inStore, TextLong.class, LongWritable.class
    , LogAnalyticsMapper.class, true);

/* Reducers are initialized with GoraReducer#initReducer().
 * If the output is not to be persisted via Gora, any reducer 
 * can be used instead. */
GoraReducer.initReducerJob(job, outStore, LogAnalyticsReducer.class);

Then, instead of using GoraReducer.initReducerJob() you can just configure your own reducer, as told following your link (if it is a correct answer):

GoraMapper.initMapperJob(job, inStore, TextLong.class, LongWritable.class
    , LogAnalyticsMapper.class, true);
job.setOutputFormatClass(MultiTableOutputFormat.class);
job.setReducerClass(MyReducer.class);
job.setNumReduceTasks(2);
TableMapReduceUtil.addDependencyJars(job);
TableMapReduceUtil.addDependencyJars(job.getConfiguration());

Know that in the example before, the mapper emits (TextLong, LongWritable) key-values, so your reducer would be something like, from the link you wrote and the answer:

public class MyReducer extends TableReducer<TextLong, LongWritable, Put> {

    private static final Logger logger = Logger.getLogger( MyReducer.class );

    @SuppressWarnings( "deprecation" )
    @Override
    protected void reduce( TextLong key, Iterable<LongWritable> data, Context context ) throws IOException, InterruptedException {
        logger.info( "Working on ---> " + key.toString() );
        for ( Result res : data ) {
            Put put = new Put( res.getRow() );
            KeyValue[] raw = res.raw();
            for ( KeyValue kv : raw ) {
                put.add( kv );
            }

        ImmutableBytesWritable key = new ImmutableBytesWritable(Bytes.toBytes("tableName"));
        context.write(key, put);    

        }
    }
}

Again, I never did this... so maybe doesn't work :\