Expected timestamp in Flume event headers, but it was null

I am using below config details to embed Twitter feeds in HDFS using Flume but get expected timestamp in Flume event headers but that was null

twitter.conf

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken =  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.keywords = bigdata, hadoop, hive, hbase
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = /user/farooque/bigdata/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

      

Working with the team

$ flume-ng agent --conf-file twitter.conf --name TwitterAgent

      

where twitter.conf

is my config file name

But getting Error as:

java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
        at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:200)
        at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:396)
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:388)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:745)
15/06/04 18:26:01 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.

      

Looking for more help?

+3


source to share


3 answers


In twitter.conf added another configuration property as

TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true

      



and the problem is solved.

More details Refer to Hadoop tutorial.info

+7


source


With the option "TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true" it will use the destination timestamp (eg HDFS sink). Instead, if you want to use the actual timestamp of the event, we must use interceptors. Use below line in config or properties file.



TwitterAgent.sources.Twitter.interceptors = interceptor1
TwitterAgent.sources.Twitter.interceptors.interceptor1.type = timestamp

      

0


source


You are using org.apache.flume.source.twitter.TwitterSource

, which is Apache provided by Twitter Source. It doesn't come with a built timestamp

in Flume event. So you have 2 options:

1) Use either com.cloudera.flume.source.TwitterSource in your config file.

2) Or you can add the property TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true

to your config file.

Please note that you are facing this problem because you have specified timestamp options on your HDFS path /user/farooque/bigdata/tweets/%Y/%m/%d/%H/

. If you don't specify them, then both Apache and Cloudera sources will work without any problem.

0


source







All Articles