Unicode decode error while fetching Twitter data using Python

When getting Twitter data for a specific Arabic keyword like this:

#imports
from tweepy import Stream
from tweepy import OAuthHandler 
from tweepy.streaming import StreamListener

#setting up the keys
consumer_key = '………….' 
consumer_secret = '…………….'
access_token = '…………..'
access_secret = '……...'

class TweetListener(StreamListener):
    # A listener handles tweets are the received from the stream.
    #This is a basic listener that just prints received tweets to standard output

    def on_data(self, data):
        print (data)
        return True

    def on_error(self, status):
        print (status)

    #printing all the tweets to the standard output
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_secret)

    stream = Stream(auth, TweetListener())
    stream.filter(track=['سوريا'])

      

I got this error message:

Traceback (most recent call last):
File "/Users/Mona/Desktop/twitter.py", line 29, in <module>
stream.filter(track=['سوريا'])
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site- packages/tweepy/streaming.py", line 303, in filter
encoded_track = [s.encode(encoding) for s in track]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in position 0: ordinal not in range(128)

      

Please, help!

+3


source to share


1 answer


I looked at the tweepy source code and found a line in the source for the Stream, which seems to be causing the problem . A string from a filter method. When you call stream.filter(track=['سوريا'])

in your code, Stream calls s.encode('utf-8')

where s = 'سوريا' (looking at the source code of the filter, you will use utf-8 by default). It is at this point that the code throws an exception.

To fix this, we need to use a Unicode string.



 t = u"سوريا"
 stream.filter(track=[t])

      

(I just put your line in the t variable for clarity).

+6


source







All Articles