Prepping Record Data in R – Gates Bolton Analytics

Record data is made up of rows and columns. Rows can also be refereed to as instances, vectors, observations, etc. Columns can be refereed to as variables, attributes, dimensions, features, etc.

While record data is partially organized, it must still be cleaned and prepared for analysis. This tutorial will show common steps for cleaning and preparing record data using R. However, I strongly recommend that you review the tutorials and examples on general data cleaning.

Preparing, cleaning, and formatting datasets is not a one-size-fits-all goal. It will be different depending on the data you start with, the models and methods you plan to use, and the programming language you plan to use.

This tutorial will illustrate how to read in and perform common cleaning steps on record data using R programming.

Topics:

Data files and paths
Managing missing values
Managing incorrect values or incorrect value formats
Visualizations
Dropping rows or columns
Checking and updating data types
Cleaning and exploring using ggplot visualization

LINK TO CODE

Example 2:

This next example shows data cleaning using R

Topics include missing values, data types, file IO, using sapply and lapply, other methods, as well as data vis.

LINK TO CODE

Example 3:

This example looks specifically at the identification and management of outliers.

It also uses Grubb’s Rule and data vis.

LINK TO CODE