By William Nee
Senior Director, Market Intelligence
In part 2 of this series (located here), we registered our Twitter application, and in response, Twitter approved our app and provided us with the tokens we’ll need to access Twitter data.
This is a good time to review the considerable restrictions on both the amount of data and the frequency with which it can be retrieved that Twitter imposed beginning in July 2018. This policy has been implemented in the form of three tiers, each with its own granted level of access as defined in the screenshot below:
Note that the biggest restriction on the Standard (free) band is the limitation on the number of Tweets that can be retrieved. This is limited to seven days, which is a considerable restriction, but still gives us access to a large number of Tweets when one considers the heavy volume of data flowing through Twitter on an hourly basis.
Now back to coding our application… Since we’ve now been granted permission to create an application, when we log onto the Twitter developer site we’re greeted by a screen like the one below. This shows us the data access restrictions our account is subject to.
If the restrictions imposed by Standard tier access prevents you from accomplishing what you want to achieve with your application, you could of course consider purchasing access to one of the other tiers. There are also other options. For example, you could schedule a query/retrieval program to run on a weekly basis using crontab or Windows Scheduler (depending on the OS you’re running on), and then store those results in a data store for safekeeping and analysis.
We’re almost ready to begin coding our application! But before we go any further it’s probably a good time to touch upon some of the technology we’ll be using to develop our program.
- Python – For our program we’ll be writing in Python, a popular and powerful scripting language. There are several Python libraries/wrappers that provide access to the Twitter API. We’ll be using the popular Tweepy library, but we could just as well have chosen a different one (for example, Twython is also very popular).
- Pandas – A popular library of data structures and analysis tools that’s often used in conjunction with Python. We’ll be using Pandas to present the retrieved data into a more aesthetically pleasing format.
After installing Python and an editor (if we choose to; I use a simple but effective one called Geany), we need to install Tweepy and Pandas. This can be done in a variety of ways; the most popular is to use the package management system (PIP).
We now need to gain access to the Twitter API through the developer credentials we’ve been granted. As we discussed earlier, these keys and tokens are equivalent to passwords, and therefore ideally shouldn’t be included in your code as plain text. One common method used to secure this important information is to store it in an external file that will only be accessed when the program is executed. For this example, we’ll be writing our keys and tokens to the file credentials.py. The contents of this file are shown below (note of course you’ll be entering in your assigned credentials where the text says, ‘your assigned … here’):
Now, we’ll load our assigned keys and tokens from the file credentials.py, and pass those on to Twitter via OAuth, which is the standard authentication method used for applications like ours.
Great! We’ve accomplished a lot in this article; we’ve decided on the software we’ll be using (e.g. Python, JSON Pandas) installed the relevant libraries, and have accessed Twitter by using the credentials we’ve been assigned. In the next article we’ll start retrieving and working with Tweets!