Expected timestamp in the Flume event headers, but it was null

6.1k Views Asked by At

I am using below configuration details to push Twitter feeds into HDFS using Flume, but getting Expected timestamp in the Flume event headers, but it was null

twitter.conf

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken =  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.keywords = bigdata, hadoop, hive, hbase
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = /user/farooque/bigdata/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

Running with command

$ flume-ng agent --conf-file twitter.conf --name TwitterAgent

where twitter.conf is my config file name

But getting Error as:

java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
        at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:200)
        at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:396)
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:388)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:745)
15/06/04 18:26:01 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.

Looking for further help??

3

There are 3 best solutions below

0
On BEST ANSWER

In twitter.conf added one more config property as

TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true

and issue got resolved.

For more details Refer Hadoop tutorial.info

0
On

You are using org.apache.flume.source.twitter.TwitterSource which is Apache provided Twitter Source. It does not come with built in timestamp in the Flume Event. So you have 2 options here:

1) Either use com.cloudera.flume.source.TwitterSource in your config file.

2) Or you can add TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true property in your config file.

Note that you are facing this issue because you have specified timestamp parameters in your HDFS path /user/farooque/bigdata/tweets/%Y/%m/%d/%H/. If you don't specify these, then both Apache and Cloudera provided sources will work without any problem.

0
On

With the option "TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true", it will use the timestamp of the destination (i.e HDFS sink). Instead if you want to use the actual event's timestamp, then we have to use interceptors. Use below line in the configuration or properties file.

TwitterAgent.sources.Twitter.interceptors = interceptor1
TwitterAgent.sources.Twitter.interceptors.interceptor1.type = timestamp