Record data is made up of rows and columns. Rows can also be refereed to as instances, vectors, observations, etc. Columns can be refereed to as variables, attributes, dimensions, features, etc.
While record data is partially organized, it must still be cleaned and prepared for analysis. This tutorial will show common steps for cleaning and preparing record data using R. However, I strongly recommend that you review the tutorials and examples on general data cleaning.
Preparing, cleaning, and formatting datasets is not a one-size-fits-all goal. It will be different depending on the data you start with, the models and methods you plan to use, and the programming language you plan to use.
This tutorial will illustrate how to read in and perform common cleaning steps on record data using R programming.
Topics:
- Data files and paths
- Managing missing values
- Managing incorrect values or incorrect value formats
- Visualizations
- Dropping rows or columns
- Checking and updating data types
- Cleaning and exploring using ggplot visualization
Example 2:
This next example shows data cleaning using R
Topics include missing values, data types, file IO, using sapply and lapply, other methods, as well as data vis.
Example 3:
This example looks specifically at the identification and management of outliers.
It also uses Grubb’s Rule and data vis.