Which Twitter API should I use to fetch large numbers of tweets for NLP research?

I would like to extract as many tweets as possible that contain a given keyword (usually a company name).

I'm using the Twitter Search API, but it's limited to "recent tweets". So for a relatively rare keyword, I can only get 500 tweets at most.

Twitter says you shouldn't use the search API for research. So which API should I use?


source to share

2 answers

Twitter does not provide free access to historical data. Datasift and Gnip sell Twitter firewall access.



To get lots of tweets with specific keywords, use the Streaming API with Statuses / Filter .

First create a file (eg "tracking.txt") with track terms with keywords separated by commas. This can include hash tags. For example, I used the following to get tweets with a link and some hashtags.

track=http #baby,http #family,http #children, ...


Then use curl to redirect the stream to a file. Be sure to use your twitter and password.

curl -d @tracking.txt https://stream.twitter.com/1/statuses/filter.json -uAnyTwitterUser:Password > stream.json




All Articles