Using the Twitter API

Twitter has an API that is very easy to use but you only have access to the most recent tweets and to a small number of calls: see some of the limitations here:

Setting it up

Setting up the authentication is the hardest part, after that getting some text is very easy. If you allow it to cache the authentication you won’t have to follow these steps again.

Here’s my page

  1. Set up a Twitter account (or use an existing account)

  2. Create an app here: Put anything under ‘Website’ (I used google), and as the Callback URL.

  3. Get your consumer key and consumer Secret from the “Keys and Access Tokens” page, they should be the first two options. Save these as key and secret in R.

  4. Load the twitteR library and run the setup_twitter_oauth() function.

Now you have two choices, the first is to authenticate using your browser, which is not great if you are planning to do this many times.

  1. It will ask you whether you want to cache your access credentials, say Yes.

  2. It will popup a page to authorize your computer, sign in.

The second option is to suppy two more pieces of information, the access token and the access token secret, both found on the same page as the consumer key and consumer secret.

  1. Get you access token and access secret from the “Keys and Access Tokens page”.

  2. Add them to the setup_twitter_oauth() function.

key <- key  #I hid mine from you
secret <- secret  #I hid mine from you
accessToken <- accessToken  #I hid mine from you
accessSecret <- accessSecret  #I hid mine from you

setup_twitter_oauth(key,secret, accessToken, accessSecret)
## [1] "Using direct authentication"


Now lets find some tweets.

Suppose we want to get the information from a list of profiles. Lets see what Khamenei, the Supreme Leader of Iran, has been saying recently.

The command userTimeline scrapes a ton of information from their page, and we can tell it to exclude retweets and replies.

page <- list("khamenei_ir")
#for(i in 1:length(page)){
i <-1 ; currentpage <- page[[i]]
tweets <- userTimeline(currentpage, n=100, includeRts= FALSE, excludeReplies = TRUE) 

twListToDF turns these tweets into a dataframe.

tweets <- twListToDF(tweets)
head(tweets[,c("text")])  #here are the most recent tweets
## [1] "If we want to adopt a logical, precise view of issues concerning women, the 1st prerequisite is to totally cleanse…"     
## [2] "Commemorating and paying tribute to the lofty martyrs signifies continuing their movement. Our enemies do not want…"     
## [3] "#Iran and #Armenia are good neighbors, enjoying historical relations. Contrary to what the U.S. desires, the ties b…"    
## [4] "The enemies of Truth and God's religion did not emerge recently; they've always been there, lining up. Today the fr…"    
## [5] "The key to #Syria's victory &amp;US and its regional mercenaries' defeat is Syrian president and people's resolve and r…"
## [6] "People who are fatigued, depressed and sluggish won’t find friends or companions. If you want to do teamwork, work…"

We could now run some topic models on this corpus of text.


Alternatively, we might want to find anyone who is feeling particularly blessed using the #alhamdulillah hashtag. Include multiple queries using the + seperator. We can extract a lot of information, including of course, the text itself. We could also set limits on the geographic area or the time frame.

tweets <- searchTwitter('#alhamdulillah', n=100)

txt = sapply(tweets, function(x) x$getText())
## [1] "RT @muftimenk: Congratulations @MBuhari upon your reelection! May the Almighty protect you, continue to bless you &amp; guide you to the best d…"
## [2] "The things you take for granted, someone else is praying for...\n.\n#Alhamdulillah\U0001f338"                                                    
## [3] "Na dai faɗi haka ne domin ku sami ceto.#Alhamdulillah #Gaskiya #Arewa #Hausa"                                                                    
## [4] "#Alhamdulillah"                                                                                                          
## [5] "RT @Yasminemiah_: Get you friends who take you to see your favourite team at home ❤️ #Alhamdulillah Alhamdulillah"        
## [6] "Get you friends who take you to see your favourite team at home ❤️ #Alhamdulillah Alhamdulillah"

Check what other information we can get from these tweets with ?status

Map retweets

We can also look for patterns in the retweets and make an extraneous figure that looks pretty cool but doesn’t really tell us that much. I got these steps from this page:

Search the text for string patterns indicating a retweet

rt = grep("(RT|via)((?:\\b\\W*@\\w+)+)", txt,
rt  #these are retweets
##  [1]  1  5  7 10 12 13 16 23 26 28 29 35 37 38 41 42 43 47 50 52 53 64 66
## [24] 72 79 88 90 91 92 94 95 96 98

# create list to store user names
who_retweet = as.list(1:length(rt)); who_post = as.list(1:length(rt))

for (i in 1:length(rt)) {
  twit = tweets[[rt[i]]] # get tweet with retweet entity
   poster = str_extract_all(twit$getText(),"(RT|via)((?:\\b\\W*@\\w+)+)") # get retweet source 
   poster = gsub(":", "", unlist(poster)) #remove ':'
   who_post[[i]] = gsub("(RT @|via @)", "", poster,   # name of retweeted user
   who_retweet[[i]] = rep(twit$getScreenName(), length(poster))  # name of retweeting user 

who_post = unlist(who_post); who_retweet = unlist(who_retweet) #unlist

Options for the setting up the graph, I borrowed these from the link above.


# two column matrix of edges
retweeter_poster = cbind(who_retweet, who_post)

# generate graph
rt_graph = graph.edgelist(retweeter_poster)

# get vertex names
ver_labs = get.vertex.attribute(rt_graph, "name", index=V(rt_graph))

# choose some layout
glay = layout.fruchterman.reingold(rt_graph)

# plot
par(bg="gray95", mar=c(1,1,1,1))
plot(rt_graph, layout=glay,
   edge.color=hsv(h=.95, s=1, v=.7, alpha=0.5))
# add title
title("\nTweets with '#Al Hamdulillah'",
   cex.main=1, col.main="black")