opening a gzip compressed file from hive is returning "Not a data file"

129 Views Asked by At

below is my configuration which works perfectly fine for non compressed data

agent.sinks.test.type = hdfs
agent.sinks.test.hdfs.useLocalTimeStamp = true
agent.sinks.test.hdfs.path = s3n://AccessKeys@test/%{topic}/utc=%s
agent.sinks.test.hdfs.roundUnit = minute
agent.sinks.test.hdfs.round = true
agent.sinks.test.hdfs.roundValue = 10
agent.sinks.test.hdfs.fileSuffix = .avro
agent.sinks.test.serializer = 
com.test.flume.sink.serializer.GenericRecordAvroEventSerializer$Builder
agent.sinks.test.hdfs.fileType = DataStream
agent.sinks.test.hdfs.maxOpenFiles=100
agent.sinks.test.hdfs.appendTimeout = 5000
agent.sinks.test.hdfs.callTimeout = 4000
agent.sinks.test.hdfs.rollInterval = 60
agent.sinks.test.hdfs.rollSize = 0 
agent.sinks.test.hdfs.rollCount = 1000
agent.sinks.test.hdfs.batchSize = 1000
agent.sinks.test.hdfs.threadsPoolSize=100

I am trying to add compression using gzip to this as follows

agent.sinks.test.type = hdfs
agent.sinks.test.hdfs.useLocalTimeStamp = true
agent.sinks.test.hdfs.path = s3n://AccessKeys@test/%{topic}/utc=%s
agent.sinks.test.hdfs.roundUnit = minute
agent.sinks.test.hdfs.round = true
agent.sinks.test.hdfs.roundValue = 10
agent.sinks.test.hdfs.fileSuffix = .avro
agent.sinks.test.serializer = 
com.test.flume.sink.serializer.GenericRecordAvroEventSerializer$Builder
agent.sinks.test.hdfs.fileType = CompressedStream
agent.sinks.test.hdfs.codeC = gzip
agent.sinks.test.hdfs.maxOpenFiles=100
agent.sinks.test.hdfs.appendTimeout = 10000
agent.sinks.test.hdfs.callTimeout = 4000
agent.sinks.test.hdfs.rollInterval = 60
agent.sinks.test.hdfs.rollSize = 0
agent.sinks.test.hdfs.rollCount = 1000
agent.sinks.test.hdfs.batchSize = 1000
agent.sinks.test.hdfs.threadsPoolSize=100

All the above data is being stored in s3 and when i try to retrieve the data from hive i am getting the below error. Exception in thread "main" java.io.IOException: Not an Avro data file

Can you please let me know why is my configuration not working ?

0

There are 0 best solutions below