Expected timestamp in Flume event headers, but it was null
I am using below config details to embed Twitter feeds in HDFS using Flume but get expected timestamp in Flume event headers but that was null
twitter.conf
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.keywords = bigdata, hadoop, hive, hbase
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = /user/farooque/bigdata/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
Working with the team
$ flume-ng agent --conf-file twitter.conf --name TwitterAgent
where twitter.conf
is my config file name
But getting Error as:
java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:200)
at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:396)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:388)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
15/06/04 18:26:01 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
Looking for more help?
source to share
In twitter.conf added another configuration property as
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
and the problem is solved.
More details Refer to Hadoop tutorial.info
source to share
With the option "TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true" it will use the destination timestamp (eg HDFS sink). Instead, if you want to use the actual timestamp of the event, we must use interceptors. Use below line in config or properties file.
TwitterAgent.sources.Twitter.interceptors = interceptor1
TwitterAgent.sources.Twitter.interceptors.interceptor1.type = timestamp
source to share
You are using org.apache.flume.source.twitter.TwitterSource
, which is Apache provided by Twitter Source. It doesn't come with a built timestamp
in Flume event. So you have 2 options:
1) Use either com.cloudera.flume.source.TwitterSource in your config file.
2) Or you can add the property TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
to your config file.
Please note that you are facing this problem because you have specified timestamp options on your HDFS path /user/farooque/bigdata/tweets/%Y/%m/%d/%H/
. If you don't specify them, then both Apache and Cloudera sources will work without any problem.
source to share