Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Using gbm

Generalised boosted models, as proposed by [#!friedman:2001:greedy_func_approx!#] and extended by [#!friedman:2002:stoch_gradient_boost!#], has been implemented for R as the gbm package by Greg Ridgeway. This is a much more extensive package for boosting than the boost package.

We illustrate AdaBoost using the distribution option of the gbm function.

> library(gbm)
> load("wine.RData")
> ds <- wine
> ds$Type <- as.numeric(ds$Type)
> ds$Type[ds$Type>1] <- 0
> ds$Type
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> ds.gbm <- gbm(Type ~ Alcohol + Malic + Ash + Alcalinity + Magnesium +
                Phenols + Flavanoids + Nonflavanoids + Proanthocyanins +
                Color + Hue + Dilution + Proline,
                data=ds, distribution="adaboost", n.trees=100)
Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.9408             nan     0.0010    0.0006
     2        0.9402             nan     0.0010    0.0006
     3        0.9394             nan     0.0010    0.0007
     4        0.9387             nan     0.0010    0.0007
     5        0.9381             nan     0.0010    0.0005
     6        0.9374             nan     0.0010    0.0006
     7        0.9368             nan     0.0010    0.0006
     8        0.9361             nan     0.0010    0.0007
     9        0.9354             nan     0.0010    0.0006
    10        0.9349             nan     0.0010    0.0004
   100        0.8750             nan     0.0010    0.0007

> summary(ds.gbm)
               var  rel.inf
1          Proline 91.82978
2       Flavanoids  8.17022
3          Alcohol  0.00000
4            Malic  0.00000
5              Ash  0.00000
6       Alcalinity  0.00000
7        Magnesium  0.00000
8          Phenols  0.00000
9    Nonflavanoids  0.00000
10 Proanthocyanins  0.00000
11           Color  0.00000
12             Hue  0.00000
13        Dilution  0.00000
> pretty.gbm.tree(ds.gbm)
  SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight
0       12  8.675000e+02        1         2           3       65.36408     89
1       -1 -8.139656e-04       -1        -1          -1        0.00000     62
2       -1  9.236987e-04       -1        -1          -1        0.00000     27
3       -1 -2.868090e-04       -1        -1          -1        0.00000     89
     Prediction
0 -0.0002868090
1 -0.0008139656
2  0.0009236987
3 -0.0002868090
> gbm.show.rules(ds.gbm)
Number of models: 100

Tree 1: Weight XXXX
  Proline < 867.50 : 0 (XXXX/XXXX)
  Proline >= 867.50 : 1 (XXXX/XXXX)
  Proline missing : 0 (XXXX/XXXX)
[...]
Tree 100: Weight XXXX
  Proline < 755.00 : 0 (XXXX/XXXX)
  Proline >= 755.00 : 1 (XXXX/XXXX)
  Proline missing : 0 (XXXX/XXXX)



Copyright © 2004-2006 [email protected]
Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.