Reading from HBase with scalding

800 Views Asked by At

I'm very new to Cascading/Scalding, and cannot figure out, hot to read data from HBase.

I have a table in HBase, where the hand history of poker games is stored (in a very straightforward manner: id -> hand, serialized with ProtoBuf). The job below should go through entire history, and build a dictionary of all players:

class DictionaryBuilder(args: Args) extends Job(args) {

  val input = new HBaseSource("hand", "localhost", 'hand, Array("d"), Array("blob"))
  val output = TextLine("tutorial/data/output0.txt")

  input
    .read
    .flatMap('hand -> 'player) {
    handBytes: Array[Byte] =>
       HandHistory.parseFrom(handBytes).getPlayerList.map(_.getName)
    }
    .write(output)

}

However, when I run the job above, the error is thrown

Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.
    at com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:73)
    at com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:106)
    at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:163)
    at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:661)

,which means that the data that has came to flatMap is not the byte array I can directly work with.

What am I missing?

1

There are 1 best solutions below

0
On

Have a look to this project https://github.com/ParallelAI/SpyGlass that provides an HBase tap for Scalding. By the way if you want to avoid to deal with Array[Byte], with SpyGlass, you can use the fromBytesWritable method to convert the fields you need to deal with.