Filter data in Twitter Streaming API

I am currently experimenting with the Streaming Stream API. Everything works like a charm, but the API is sending me tons of data that I don't need. Is there a way to filter the data that the API sends me?

I am using the following stream: https://stream.twitter.com/1.1/statuses/filter.json

+3


source to share


3 answers


Take a look at the filter api flow:

https://dev.twitter.com/docs/api/1.1/post/statuses/filter

You can enter a set of keywords as a filter for Twitter tracking, according to current restrictions, you can track up to 400 keywords.

After fetching tweets, you need to perform manual filtering again to remove noisy data.



So, if you can specify what you are looking for by a set of keywords, you will achieve what you want; but there will always be noise in your data because it is almost impossible to determine smtg exactly through simple keyword filtering.

For example, let's say you want to track all tweets associated with the XYZ brand. For tweets about a brand XYZ

, you can have one set of keywords containing only "XYZ". The API will give you all the tweets containing XYZ

, but suppose "XYZ" makes sense in some language and people say the language will tweet about that word and you get that too. Also suppose there is a city called XYZ and people send registration checks. So at this point, you need to filter out tweets that are not related to your topic, either by language detection or by looking for contextual information. But the key is to provide a keyword specific to the topic you want to cover.

Greetings.

+6


source


the answer is " No " for the question "is there a way (other than manually searching on your own) to find that a tweet matches the WHICH of the three keywords I entered in the filter?" We have to do it manually.



0


source


Check out the storm type project . there are examples for filtering api with twitter4j.

-1


source







All Articles