Read uncompressed thrift files in spark

1.1k Views Asked by Martin Klosi At 25 June 2025 at 22:41

I'm trying to get spark to read uncompressed thrift files from s3. So far it has not been working.

data is loaded in s3 as uncompressed thrift files. The source is AWS Kinesis Firehose.
I have a tool that deserializes files with no problem, so I know that thrift serialization/deserialization works.
in spark, im using newAPIHadoopFile
using elephantbird's LzoThriftBlockInputFormat, I am able to successfully read lzo-compressed thrift files
I can't figure out what InputFormat should I use to read uncompressed thrift files.

Is that possible with any of the InputFormats out there? Do I have to implement my own?

Original Q&A

There are 1 best solutions below

Martin Klosi On 15 March 2017 at 20:14

I ended up writing my own custom thrift deserializer.

Needed to implement a custom InputFormat and custom RecordReader. Still surprised that such classes don't already exist in some lib. The two classes have been tested and work, but since i stopped working on the project soon after i solved this, the code is not cleaned up.

https://github.com/mklosi/thrift-deserializer

Read uncompressed thrift files in spark

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in THRIFT

Related Questions in HADOOP-LZO

Trending Questions

Popular # Hahtags

Popular Questions