MRUnit Create HBase Result Properly

689 Views Asked by At

I have a mapreduce job where the mapper reads from several HBase tables. It works fine on my cluster. I am writing some unit tests retroactively with MRUnit. I am attempting to compose a Result object from a list of manually instantiated KeyValue objects for use as the input to the map() method. Only the first KeyValue object in the list seems to get retained in the Result object when I subsequently attempt to read my several columns in the map() method -- the others are null. In the below I have a single column family named "0".

private MapDriver<ImmutableBytesWritable, Result, Text, Text> mapDriver;
private HopperHbaseMapper hopperHbaseMapper;

@Before
public void setUp() {    
  hopperHbaseMapper = new HopperHbaseMapper();
  mapDriver = MapDriver.newMapDriver(hopperHbaseMapper);    
}

@Test
public void testMapHbase() throws Exception {    
  String testKey = "123";
  ImmutableBytesWritable key = new ImmutableBytesWritable(testKey.getBytes());    
  List<KeyValue> keyValues = new ArrayList<KeyValue>();
  KeyValue keyValue1 = new KeyValue(testKey.getBytes(), "0".getBytes(), "first_name".getBytes(), "Joe".getBytes());
  KeyValue keyValue2 = new KeyValue(testKey.getBytes(), "0".getBytes(), "last_name".getBytes(), "Blow".getBytes());
  keyValues.add(keyValue1);
  keyValues.add(keyValue2);
  Result result = new Result(keyValues);
  mapDriver.withInput(key, result);
  mapDriver.withOutput(new Text(testKey), new Text(testKey + "\tJoe\tBlow"));
  mapDriver.runTest();
}

Am I creating the Result object incorrectly? As mentioned, the mapper works fine on real HBase data on my cluster, so I believe it is my test setup that is at fault.

2

There are 2 best solutions below

0
On

In the newest Hbase libraries Result method is deprecated so we should use Result.create method instead. Writing my solution I faced the same problem as question author. The solution was found in a comment from Sakthivel. Here is Sakthivel solution implemented in Scala language.

import org.apache.hadoop.hbase.{CellUtil, KeyValue}
import scala.collection.immutable.TreeSet


implicit val ordering =  KeyValue.COMPARATOR

val cells = TreeSet(
      CellUtil.createCell(toBytes("myRowKey"), toBytes("myColumnFamily"),toBytes("myQualifier1"), 1000L, KeyValue.Type.Minimum.getCode, toBytes("myValue1")),
      CellUtil.createCell(toBytes("myRowKey"), toBytes("myColumnFamily"),toBytes("myQualifier2"), 1000L, KeyValue.Type.Minimum.getCode, toBytes("myValue2")),
      CellUtil.createCell(toBytes("myRowKey"), toBytes("myColumnFamily"),toBytes("myQualifier3"), 1000L, KeyValue.Type.Minimum.getCode, toBytes("myValue3")),
      CellUtil.createCell(toBytes("myRowKey"), toBytes("myColumnFamily"),toBytes("myQualifier4"), 1000L, KeyValue.Type.Minimum.getCode, toBytes("myValue4")),
      CellUtil.createCell(toBytes("myRowKey"), toBytes("myColumnFamily"),toBytes("myQualifier5"), 1000L, KeyValue.Type.Minimum.getCode, toBytes("myValue5"))
    )

val result = Result.create(cells.toArray)

Hope it will help somebody writing unit testing for hbase functionality.

0
On

Like rowkey, HBase stores columns also in a lexicographic order. So you have to use TreeSet<KeyValue> set = new TreeSet<KeyValue>(KeyValue.COMPARATOR); ans pass this set to Result constructor like, Result(set).

TreeSet<KeyValue> set = new TreeSet<KeyValue>(KeyValue.COMPARATOR);

byte[] row = Bytes.toBytes("row01");
byte[] cf = Bytes.toBytes("cf");
set.add(new KeyValue(row, cf, "cone".getBytes(), Bytes.toBytes("row01_cone_one")));
set.add(new KeyValue(row, cf, "ctwo".getBytes(), Bytes.toBytes("row01_ctwo_two")));
set.add(new KeyValue(row, cf, "cthree".getBytes(), Bytes.toBytes("row01_cthree_three")));
set.add(new KeyValue(row, cf, "cfour".getBytes(), Bytes.toBytes("row01_cfour_four")));
set.add(new KeyValue(row, cf, "cfive".getBytes(), Bytes.toBytes("row01_cfive_five")));
set.add(new KeyValue(row, cf, "csix".getBytes(), Bytes.toBytes("row01_csix_six")));

KeyValue[] kvs = new KeyValue[set.size()];
set.toArray(kvs);

Result result = new Result(kvs);
mapDriver.withInput(key, result);

I also posted my answer here