Number of graphs / frequency of tweets per word per month

I hooked R to Twitter and scrape using a function searchTwitter

in R and scrape the resulting data for punctuation, lowercase letters, etc. Now I am trying to do the following:

  • count the number of tweets with the word "auction" that were tweeted per month from January 2015 to the end of July 2015.
  • Set up the graph on a simple histogram ( x-axis - month

    ; y-axis - number of tweets

    ).

I would like to reuse this for retweets, mentions, replies, and favorites.

This is what I have tried so far:

#load the packages into R
>library(twitteR)
>library(plyr)
>library(ggplot2)    

# Register an application (API) at https://apps.twitter.com/
# Look up the API key and create a token – you need for both the key and the secret
# Assign the keys to variables and use the authorization
api_key <- "your API key from twitter"
api_secret <- "your Secret key from twitter"
access_token <- "you Access Token from twitter"
access_token_secret <- "you Access Token Secret key from twitter"
setup_twitter_oauth(api_key,api_secret,access_token,access_token_secret)

      

1 "Using Direct Authentication" Use a local file to cache OAuth access credentials between R sessions?
    1: Yes
    2: No
    # Type 1 and press Enter. Choice: 1

auctiontweets <- searchTwitter("auction", since = "2015-01-01", until = "2015-08-03", n=1000)

      

However, I am unable to create the dataframe, getting the following error:

tweet.dataframe <- data.frame(searchTwitter("action", since = "2015-01-01", until = "2015-08-03", n=3000))

      

Error in as.data.frame.default (x [[i]], optional = TRUE):
    cannot force class "structure" ("status", package = "twitteR") "to data.frame

I found some code on how to set up users by the hour; but couldn't change it to work for tweets with a specific word (ie "auction") per month:

yultweets <- searchTwitter("#accessyul", n=1500)
y <- twListToDF(yultweets)
y$created <- as.POSIXct(format(y$created, tz="America/Montreal"))
yply <- ddply(y, .var = "screenName", .fun = function(x) {return(subset(x,     
created %in% min(created), select = c(screenName,created)))})
yplytime <- arrange(yply,-desc(created))
y$screenName=factor(y$screenName, levels = yplytime$screenName)
ggplot(y) + geom_point(aes(x=created,y=screenName)) + ylab("Twitter username") + xlab("Time")

      

The source can be found here .

+3


source to share


1 answer


Since you haven't provided even a small piece of your data that we can handle, my answer may be superficial.

library(stringi); library(dplyr); library(SciencesPo)

  df <- data.frame(tweets = c("blah, blah, Blah, auction","blah, auction", "blah, blah", "this auction, blah", "today"), date=c('2015-07-01','2015-06-01','2015-05-01','2015-07-31','2015-05-01'))
  > df
                         tweets       date
    1 blah, blah, Blah, auction 2015-07-01
    2             blah, auction 2015-06-01
    3                blah, blah 2015-05-01
    4        this auction, blah 2015-07-31
    5                     today 2015-05-01

 filter = "auction"

> df$n <- vapply(df$tweets, function(x) sum(stri_count_fixed(x, filter)), 1L)
> df
                     tweets       date n
1 blah, blah, Blah, auction 2015-07-01 1
2             blah, auction 2015-06-01 1
3                blah, blah 2015-05-01 0
4        this auction, blah 2015-07-31 1
5                     today 2015-05-01 0

      

Then you only need to summarize:



 df %>% group_by(month=format(as.Date(date),format="%m")) %>% summarize(freq=sum(n)) 
%>%ungroup() -> df2

> df2
    Source: local data frame [3 x 2]

      month freq
    1    05 0
    2    06 1
    3    07 2
    > 

      

Voila! Bonus, write it down likeggplot(df2, aes(x=month, y=freq)) + geom_line() + theme_pub()

+1


source







All Articles