Which Twitter API should I use to fetch large numbers of tweets for NLP research?

I would like to extract as many tweets as possible that contain a given keyword (usually a company name).

I'm using the Twitter Search API, but it's limited to "recent tweets". So for a relatively rare keyword, I can only get 500 tweets at most.

Twitter says you shouldn't use the search API for research. So which API should I use?

+3


source to share


2 answers


Twitter does not provide free access to historical data. Datasift and Gnip sell Twitter firewall access.



+2


source


To get lots of tweets with specific keywords, use the Streaming API with Statuses / Filter .

First create a file (eg "tracking.txt") with track terms with keywords separated by commas. This can include hash tags. For example, I used the following to get tweets with a link and some hashtags.

track=http #baby,http #family,http #children, ...

      



Then use curl to redirect the stream to a file. Be sure to use your twitter and password.

curl -d @tracking.txt https://stream.twitter.com/1/statuses/filter.json -uAnyTwitterUser:Password > stream.json

      

+4


source







All Articles