Scalding + LZO +Protobuf

600 Views Asked by At

Are there any pointers to get Scalding to work with LZO Protobuf data on HDFS?

I am trying to read files that are stored in binary Protobuf and compressed in LZO using Scalding. Can we use Elephantbird to read those files? Any pointers will be appreciated!

I have looked at the LzoTraits and LzoProtobufScheme? But I am not sure how I should be using it to read the data? Any examples would be great!

1

There are 1 best solutions below

2
On

Here is an example:

case class SomeProto() extends FixedPathSource("/my/greatData/*")
  with LzoProtobuf[MyProtoClassHere] {
    override def column = classOf[MyProtoClassHere]
}

You can mix with other types of abstract base Sources (like TimePathedSource, or MostRecentGoodSource) in a similar way. You can mix in with LocalTapSource if you want to use the Hadoop-inside-cascading-local trick (if you don't run in cascading local mode, you don't need this).