Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Basics

Use printcp to view the performance of the model.

> printcp(wine.rpart)

Classification tree:
rpart(formula = Type ~ ., data = wine)

Variables actually used in tree construction:
[1] Dilution   Flavanoids Hue        Proline

Root node error: 107/178 = 0.60112

n= 178

        CP nsplit rel error  xerror     xstd
1 0.495327      0   1.00000 1.00000 0.061056
2 0.317757      1   0.50467 0.47664 0.056376
3 0.056075      2   0.18692 0.28037 0.046676
4 0.028037      3   0.13084 0.23364 0.043323
5 0.010000      4   0.10280 0.21495 0.041825

We can note that:

\begin{displaymath}rel error = rel error(before) - (nsplit - nsplit(before)) * CP(before)\end{displaymath}

The predict function will apply the model to data. The data must contain the same variable on which the model was built. If not an error is generated. This is a common problem when wanting to apply the model to a new dataset that does not contain all the same variables, but does contain the variables you are interested in.

> cols <- c("Type", "Dilution", "Flavanoids", "Hue", "Proline")
> predict(wine.rpart, wine[,cols])
Error in eval(expr, envir, enclos) : Object "Alcohol" not found

Fix this up with

> wine.rpart <- rpart(Type ~ Dilution + Flavanoids + Hue + Proline,
data=wine)
> predict(wine.rpart, wine[,cols])
             1          2          3
1   0.96610169 0.03389831 0.00000000
2   0.96610169 0.03389831 0.00000000
[...]
70  0.03076923 0.93846154 0.03076923
71  0.00000000 0.25000000 0.75000000
[...]
177 0.00000000 0.25000000 0.75000000
178 0.00000000 0.02564103 0.97435897

Display a confusion matrix.

> table(predict(wine.rpart, wine, type="class"), wine$Type)

     1  2  3
  1 57  2  0
  2  2 66  4
  3  0  3 44



Subsections

Copyright © 2004-2006 [email protected]
Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.