Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Graphing Means and Error Bars

The simplest plot of means is achieved using the plotmeans function of the gplots package. The example uses the wine dataset, aggregating the data into the three classes defined by Type and plotting the mean of the value for Phenols and Magnesium for each class.

[width=0.8]rplot-line-means


library("gplots")

load("wine.Rdata")
attach(wine)

pdf("graphics/rplot-line-means.pdf")
  par(mfrow=c(1,2))
  plotmeans(Magnesium ~ Type)
  plotmeans(Phenols ~ Type)
dev.off()

http://rattle.togaware.com/code/rplot-line-means.R

Both plots are placed onto the one plotting canvas (using par(mfrow=c(1,2))). They are placed side-by-side which exagerates the bars around the means. A visual inspection indicates that the three groups have quite different means for Magnesium and for Phenols, but it is more significant for Phenols.

We can evaluate this statistically using R. Comparing the means between different subsets of a dataset is called http://en.wikipedia.org/wiki/analysis_of_varianceanalysis of variance or http://en.wikipedia.org/wiki/ANOVAANOVA. Here we compare the means of Magnesium, and, separately, the means of Phenols across the Types.



> anova(lm(Phenols ~ Type))

Analysis of Variance Table

Response: Phenols
           Df Sum Sq Mean Sq F value    Pr(>F)    
Type        2 35.857  17.928  93.733 < 2.2e-16 ***
Residuals 175 33.472   0.191                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

> anova(lm(Magnesium ~ Type))

Analysis of Variance Table

Response: Magnesium
           Df  Sum Sq Mean Sq F value    Pr(>F)    
Type        2  4491.0  2245.5  12.430 8.963e-06 ***
Residuals 175 31615.1   180.7                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The Pr(>F) value is clearly smaller than 0.05, thus with 95% confidence we see that the means are different.

If however we look at just Types 2 and 3, and compare the means of the two groups:



> wine23 <- wine[Type!=1,]
> attach(wine23)
> anova(lm(Magnesium ~ Type))

Analysis of Variance Table

Response: Magnesium
           Df  Sum Sq Mean Sq F value  Pr(>F)  
Type        1   649.8   649.8  3.0141 0.08518 .
Residuals 117 25221.9   215.6                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With a Pr(>F) of 0.08518, which is larger than 0.05, the means for Magnesium across these two groups is not significantly different (at the 95% level). However, it is significant at the 90% level of confidence (indicated by the period following the number in the output, and the legend below associating this with 0.1 - 10%).

Copyright © 2004-2006 [email protected]
Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.