DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Evaluating the outcomes of data mining is important. We need to understand how well any model we build will be expected to perform, and how well it performs in comparison to other models we might choose to build.
A common approach is to compute an error rate which simply reports the number of cases that the model correctly classifiers. Common methods for estimating the empirical error rate are, for example, cross-validation (CV), the Bayesian evidence framework, and the PAC framework.
In this chapter we introduce several measures used to report on the performance of a model and review various approaches to evaluating the output of data mining. This will cover printcp, table for producing confusion matrices, ROCR for the graphical presentation of evaluations, as well as how to tune the presentations for your own needs.