Data Cleaning

Data cleaning, preparation, and formatting are critical to data science.

Even the best analysis can create poor results if applied to poor data.

Read the chapter to the right to learn about cleaning, preparing, normalizing, and transforming record and text data.

Learn the differences between record, text, network, image sequential, etc.

Power points containing topics on:

  1. Data cleaning and prep
    1. Missing values
    2. Incorrect values
  2. Outliers and discovering outliers
  3. Grubb’s Test for outliers
  4. Visualizations for outliers
  5. Binning (discretization)
  6. Aggregation
  7. Sampling methods
  8. Dimensionality
  9. Feature selection and engineering
  10. Python/Pandas examples