Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


Data Cleaning

Data cleaning deals with issues of removing errant transactions, updating transactions to account for reversals, elimination of missing data, and so on.

The aim of data cleaning is to raise the data quality to a level suitable for the selected analyses.

The data cleaning to be performed depends on purpose to which the data is to be put. Some activities will require a selection of data cleaning and data transformation modules to be applied to the data.

Data cleaning occurs early in the process and then continually throughout the process as we learn more about the data.

Field selection

Sampling

Data correction

Missing values treatment

Data transformation, e.g., birth date to age.

Derive new fields

Useful steps:

Understand the business problem.

Collect the materials about the data sources and study them to understand what data is available.

Identify the data items relevant to the business problem, e.g., tables and attributes.

Make a data extraction plan and arrange the data extraction (with DBAs).

Calculate the summary statistics of the extracted data.



Subsections
Copyright © 2004-2006 [email protected]
Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.