APIs in R
R can be used to access, download, and prepare data for analysis.
One method for gathering data is to use an API. Many sites and organizations, like Twitter, Wiki, AirNow, etc. offer a method for programmers to access and gather data (including text data).
The following code will show how to use an API in R to access data from a site called AirNow. Keep in mind that you can use similar methods to access data from any site/application that offers an API option. Each site/app has its own documentation, endpoints, and parameters. To learn these, it is best to read the documentation for any site/app you plan to access via API.
################# Gates
## Use R to access data with an API
#install.packages("twitteR")
#install.packages("ROAuth")
#install.packages("rtweet")
library(rtweet)
library(twitteR)
library(ROAuth)
library(jsonlite)
## First - go to
## https://docs.airnowapi.org/webservices
## Register
## Log in
## Then click Web Services
## Option 1: Use the site to BUILD the URL
## https://docs.airnowapi.org/forecastsbyzip/query
## !!!!!!!!!!
## You will need a KEY - therefore you must register first as
## !!!!!!!!!!
## noted above.
#install.packages("httr")
library("httr")
library("jsonlite")
## Get the data
## BUILD THE URL
base <- "http://www.airnowapi.org/aq/forecast/zipCode/"
format<-"text/csv"
zipCode="20002"
date="2022-01-16"
API_KEY="D9A You key here 8C2C7"
distance="25"
call1 <- paste(base,"?",
"format", "=", format, "&",
"zipCode", "=", zipCode, "&",
"date", "=", date,"&",
"API_KEY", "=", API_KEY, "&",
"distance", "=", distance,
sep="")
(call1)
AirNowAPI_Call<-httr::GET(call1)
(AirNowAPI_Call)
(MYDF<-httr::content(AirNowAPI_Call))
## Print to a file
AirName = "AirFileExample2.csv"
## Start the file
AirFile <- file(AirName)
## Write Tweets to file
write.csv(MYDF,AirFile, row.names = FALSE)
#####################################################################
## Open the file, read the data, and create a dataframe for analysis
#####################################################################
## Technically, we already have all of this data in MYDF. Have a look
MYDF
## But, let's read it in anyway to see how that works.
## This is where the data is located on MY computer ;)
filepath="C:/Users/profa/Desktop/GatesBoltonAnalyticsSite/DATA/AirFileExample2.csv"
(AirNowDataFrame <- read.csv(filepath))
## Is this a dataframe? Let's check
str(AirNowDataFrame) ## Yes!
##What are the data types of the variables?
## We can see that DateIssue is type "chr".
## That's not right and to CLEAN this we will need to change it to a date type
AirNowDataFrame$DateIssue <- as.Date.character(AirNowDataFrame$DateIssue)
## There are many data types that will need to be corrected. This is common
## and always part of cleaning and preparing data.
## We can change statecode to factor type
AirNowDataFrame$StateCode <- as.factor(AirNowDataFrame$StateCode)
## When you are done making all needed changes - check the types again
str(AirNowDataFrame)
## This data is NOT yet ready for analysis. You have a column (variable)
## called "Discussion" that contains text data.
## One option for this is to remove the column, save it as csv,
## and then vectorize it as well. Here, each row in Discussion is
## the same - but this will not always be the case - especially
## with a larger dataset.
AirNowDataFrame
AirNowDataFrame$Discussion[1]