APIs

An API is an Application Programming Interface.

An API is an interface that allows and enables interaction (communication) between applications. In more simple terms, you can use a API get data from Twitter, news sites, Wiki, etc.

But, let’s back up and think about this. (Be sure to review the Gathering Data with APIs Tutorial here as well –>)

Where does data come from in the real-world?

The answer is not Kaggle 🙂

Data can come from:

  1. Experiments that generate data that is then collected and cleaned.
  2. Observational studies that generate data (that is collected and cleaned. This might include interviews, surveys, sitting a bench, watching folks, and taking down feature data, etc.
  3. Organizations that collect and share data. Most (if not all) organization collect data. Such as Twitter, Facebook, the EPA, Zillow, Amazon, Walmart, etc. Sometimes, these organizations are will to share some of their data. If they want to share, they will offer an API that we can use to GET their data.
  4. Web Scraping which is the process of gathering all text (much of which will be HTML) from webpage sources. This is NOT always legal or appropriate and so it is best to use only data that you gather with a valid API offered by the organization.


https://docs.airnowapi.org/webservices
newapi.org

HOW AND WHERE DO YOU FIND AND USE APIs?

The easiest way to to find an API for data in an area of interest to you is to Google it. For example, if you Google “new API” one of the options will be newapi.org which is very easy to use and great way to both get news text data and practice with APIs.

Similarly, you can Google areas like air quality or pollution API, etc. On the left, you can see the API made available by AirNow.

TO USE an API:

One option is to go right to the site and use their tools to generate small bits of data. For example, the following image was taken from AirNow. To get this, I registered first to get a KEY. I then logged in. I then selected the Web Services and then the Query tool.

AirNow Query by Hand form the site

However, this is an efficient method and does not lend itself well to collecting and cleaning large quantities of data.

For this reason, programming languages such as R or Python have libraries that allow us to use API, to gather data, and then to clean and prepare the data for analysis. The following tutorials will illustrate API use with R or Python.