Are there any pointers to get Scalding to work with LZO Protobuf data on HDFS?
I am trying to read files that are stored in binary Protobuf and compressed in LZO using Scalding. Can we use Elephantbird to read those files? Any pointers will be appreciated!
I have looked at the LzoTraits and LzoProtobufScheme? But I am not sure how I should be using it to read the data? Any examples would be great!
Here is an example:
You can mix with other types of abstract base Sources (like TimePathedSource, or MostRecentGoodSource) in a similar way. You can mix in
with LocalTapSource
if you want to use the Hadoop-inside-cascading-local trick (if you don't run in cascading local mode, you don't need this).