Streaming api with tweepy only returns the second most recent tweet and NOT the last most recent tweet

I'm not only new to python, but also new to programming, so very grateful for your help!

I am trying to filter all tweets from the twitter streaming API using Tweepy.

I filtered by user id and confirmed that tweets are being collected in real time.

HOWEVER , it seems that only the trick of the second second is going in real time and not the most recent tweet.

Can you guys help?

import tweepy
import webbrowser
import time
import sys

consumer_key = 'xyz'
consumer_secret = 'zyx'


## Getting access key and secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth_url = auth.get_authorization_url()
print 'From your browser, please click AUTHORIZE APP and then copy the unique PIN: ' 
webbrowser.open(auth_url)
verifier = raw_input('PIN: ').strip()
auth.get_access_token(verifier)
access_key = auth.access_token.key
access_secret = auth.access_token.secret


## Authorizing account privileges
auth.set_access_token(access_key, access_secret)


## Get the local time
localtime = time.asctime( time.localtime(time.time()) )


## Status changes
api = tweepy.API(auth)
api.update_status('It worked - Current time is %s' % localtime)
print 'It worked - now go check your status!'


## Filtering the firehose
user = []
print 'Follow tweets from which user ID?'
handle = raw_input(">")
user.append(handle)

keywords = []
print 'What keywords do you want to track? Separate with commas.'
key = raw_input(">")
keywords.append(key)

class CustomStreamListener(tweepy.StreamListener):

    def on_status(self, status):

        # We'll simply print some values in a tab-delimited format
        # suitable for capturing to a flat file but you could opt 
        # store them elsewhere, retweet select statuses, etc.



        try:
            print "%s\t%s\t%s\t%s" % (status.text, 
                                      status.author.screen_name, 
                                      status.created_at, 
                                      status.source,)
        except Exception, e:
            print >> sys.stderr, 'Encountered Exception:', e
            pass

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

# Create a streaming API and set a timeout value of ??? seconds.

streaming_api = tweepy.streaming.Stream(auth, CustomStreamListener(), timeout=None)

# Optionally filter the statuses you want to track by providing a list
# of users to "follow".

print >> sys.stderr, "Filtering public timeline for %s" % keywords

streaming_api.filter(follow=handle, track=keywords)

      

+3


source to share


2 answers


I had the same problem. The answer was not as easy as running python unbuffered in my case, and I suppose it didn't solve the original poster issue either. The problem is actually the code for the tweepy package in the streaming.py file and the _read_loop () function, which I think needs to be updated to reflect the changes in the format that twitter is outputting from their streaming api.

The solution for me was to download the newest code for tweepy from github, https://github.com/tweepy/tweepy , namely the streaming.py file. You can review the recent changes to try and resolve this issue in the commit history for this file.

I looked at the details of the tweepy class and there was a problem with how the streaming.py class is read in the json tweet stream. I think this is because twitter is updating the streaming api to include the number of bits of incoming status. Long story short, there was a function here that I replaced in streaming.py to resolve this issue.



def _read_loop(self, resp):

    while self.running and not resp.isclosed():

        # Note: keep-alive newlines might be inserted before each length value.
        # read until we get a digit...
        c = '\n'
        while c == '\n' and self.running and not resp.isclosed():
            c = resp.read(1)
        delimited_string = c

        # read rest of delimiter length..
        d = ''
        while d != '\n' and self.running and not resp.isclosed():
            d = resp.read(1)
            delimited_string += d

        try:
            int_to_read = int(delimited_string)
            next_status_obj = resp.read( int_to_read )
            # print 'status_object = %s' % next_status_obj
            self._data(next_status_obj)
        except ValueError:
            pass 

    if resp.isclosed():
        self.on_closed(resp)

      

This solution also requires learning how to download the source code for the tweepy package, modify it, and then install the modified library in python. This is done by going to your top level tweepy directory and typing something like sudo setup.py install depending on your system.

I also commented to the github coders for this package to let them know what's going on.

+5


source


This is a case of output buffering. Start python with -u

(unbuffered) to prevent this from happening.

Or you can force the buffer to be flushed by executing sys.stdout.flush()

after your print statement. In the meantime, there is no need to worry about it. ”



See this answer for more ideas.

+1


source







All Articles