Analyze twitter with R
Hello Again, this post about how to analyze twitter with R to perform the analysis. In this article we will talk about the procedure or getting data from twitter and saving it as a dataset in R. it can be later used to perform sentiment analysis or if if is numerical data, to make some trend analysis.
We will be using the package twitteR available in CRAN. twitteR is an R package which provides access to the Twitter API and easy to use to analyze twitter with R. Most functionality of the API is supported, with a bias towards API calls that are more useful in data analysis as opposed to daily interaction., if you have not already installed it, you can install by
After you have install the twitter, you will need to setup your account to get API key so you can process the API requests to your twitter account. confusing? hold on, it gets better.
create a twitter app at https://developer.twitter.com/en, once you have logged in, you will have to apply for a developer account to get access to twitter API. fill in the forms and get yourself registered, once you are registered. you will see the apps button on top right corner of the screen.
Click the Apps and click Create an App on the next page. after you are done with app creation, go to app setting and set the app to Read, write & Direct message authority
after the permissions are set, goto Keys and tokens and create consumer API keys and Access token & access token secret
note down both information some where as we will be using these to authenticate our access to twitter.
setup the twitter Oauth using
setup_twitter_oauth("API key", "API secret", "Access token", "Access secret")
replace the “API key” , “API secret” , “Access token” & “Access Secret” with your generated API key & Secret.
Now you are done with setup, now lets start with data collection
Getting your first data
I always used to wonder what does those silly hashtags do? now I get to now that they are useful tags for data collections. We can get the data with reference to them.
tweets <- searchTwitter('#rstats', n=50)
in the above statement, we have defined a dataset called tweets.
- searchTwitter will search the twitter for the data tag we mention
- ‘#rstats’ is the data tag that we will be using the get information from twitter, you cann use whichever you want to analyze
- n=50, is the number of tweets we want to download.
you can see the tweets you just downloaded with
it will show you the the head of the dataset.
You might want to get only tweets that originated through this hash tag and avoid any retweets. this can be done using
head(strip_retweets(tweets, strip_manual=TRUE, strip_mt=TRUE))
notice in the screenshot above, the retweets have been removed.
looking at users
To take a closer look at a Twitter user (including yourself!), run the command getUser. This will only work correctly with users who have their profiles public, or if you’re authenticated and granted access. You can also see things such as a user’s followers, who they follow, retweets, and more. The getUser function returns a user object, which can then be polled for further information.
salman <- getUser('salman_aly')
Conversion to data.frames
There are times when it is convenient to display the object lists as an data.frame structure. To do this, every class has a reference method toDataFrame as well as a corresponding S4 method as.data.frame that works in the traditional sense. Converting a single object will typically not be particularly useful by itself but there is a convenience method to convert an entire list, twListToDF which takes a list of objects from a single twitteR class:
df <- twListToDF(tweets)
you might wonder as to how to retrieve data from the past, generally people are doing a study on some major event that has already happened (e.g. Arab Spring, an election, etc). Using the Twitter API this is impossible as you can only go back a small amount. However, if you have the ability to look ahead, it is easy to enable a prospective study by collecting data and automatically persisting it to a database. This will then allow you to load everything into a later R session, including using tools such as dplyr .
There’s a full writeup of this functionality at http://geoffjentry.blogspot.com/2014/ 02/twitter-now-supports-database.html.
A Twitter timeline is simply a stream of tweets. We support two timelines, the user timeline and the home timeline. The former provides the most recent tweets of a specified user while the latter is used to display your own most recent tweets. These both return a list of status objects. To look at a particular user’s timeline that user must either have a public account or you must have access to their account. You can either pass in the user’s name or an object of class user (more on this later). For this example, let’s use the user cranatic.
cran_tweets <- userTimeline('cranatic')
By default this command returns the 20 most recent tweet. As with most (but not all) of the functions, it also provides a mechanism to retrieve an arbitrarily large number of tweets up to limits set by the Twitter API, which vary based on the specific type of request.
cran_tweets_large <- userTimeline('cranatic', n=100)
The homeTimeline function uses your own timeline.
Twitter keeps track of topics that are popular at any given point of time, and allows one to extract that data. The getTrends function is used to pull current trend information from a given location, which is specified using a WOEID . Luckily there are two other functions to help you identify WOEIDs that you might be interested in. The availableTrendLocations function will return a data.frame with a location in each row and the woeid giving that location’s WOEID. Similarly the closestTrendLocations function is passed a latitude and longitude.
avail_trends = availableTrendLocations()
close_trends = closestTrendLocations(-42.8, -71.1)
trends = getTrends(2367105)
Today you learned, how to to analyze twitter with R