Filter data in Twitter Streaming API
I am currently experimenting with the Streaming Stream API. Everything works like a charm, but the API is sending me tons of data that I don't need. Is there a way to filter the data that the API sends me?
I am using the following stream: https://stream.twitter.com/1.1/statuses/filter.json
source to share
Take a look at the filter api flow:
You can enter a set of keywords as a filter for Twitter tracking, according to current restrictions, you can track up to 400 keywords.
After fetching tweets, you need to perform manual filtering again to remove noisy data.
So, if you can specify what you are looking for by a set of keywords, you will achieve what you want; but there will always be noise in your data because it is almost impossible to determine smtg exactly through simple keyword filtering.
For example, let's say you want to track all tweets associated with the XYZ brand. For tweets about a brand XYZ
, you can have one set of keywords containing only "XYZ". The API will give you all the tweets containing XYZ
, but suppose "XYZ" makes sense in some language and people say the language will tweet about that word and you get that too. Also suppose there is a city called XYZ and people send registration checks. So at this point, you need to filter out tweets that are not related to your topic, either by language detection or by looking for contextual information. But the key is to provide a keyword specific to the topic you want to cover.
Greetings.
source to share
Check out the storm type project . there are examples for filtering api with twitter4j.
source to share