Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Impute

Imputation is the process of filling in the gaps (or missing values) in data. Often data will contain missing values, and this can cause a problem for some modelling algorithms. For example, the random forest option silently removes any entity with any missing value! For datasets with a very large number of variables, and a reasonable number of missing values, this may well result in a small, unrepresentative dataset, or even no dataset at all!

XXXX

When Rattle performs an imputation it will store the results in a variable of the dataset which has the same name as the variable that is imputed, but prefixed with IMP_. Such variables, whether they are imputed by Rattle or already existed in the dataset loaded into Rattle (e.g., a dataset from SAS), will be treated as input variables, and the original variable marked to be ignored.



Subsections

Copyright © 2004-2006 [email protected]
Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.