Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


R: An Introduction to the Language

R is a statistical and data mining package consisting of a programming language and a graphics system. It is used throughout this book to illustrate data mining procedures. It is the programming language used to implement the Rattle graphical user interface for data mining in See Chapter 2. If you are moving to R from SAS or SPSS, then you may find the document http://oit.utk.edu/scc/RforSAS&SPSSusers.doc helpful.

In the following sections of this chapter we introduce the basics of R. We will find many examples presented which can be readily copied into an R console to facilitate learning. You will also find many examples on the R-help mailing list at https://stat.ethz.ch/mailman/listinfo/r-help.

Learning by example is a powerful learning paradigm. Motivated by the programming paradigm of ``programming by example'' cypher:1993:watch_what_i_do, the intention is that you will be able to replicate the examples from the book, and then fine tune them to suit your own needs. This, of course, is also one of the underlying principles of Rattle, as described in See Chapter 2, where all of the R commands that are used under the graphical user interface are exposed to the user. This makes it a useful teaching tool in learning R for the specific task of data mining, and also a good memory aid!

So R is a language. The basic modus operandi is to write sentences expressed in this language. After a while you will want to do more than to issue single, simple, commands (sentences), but to write sentences and paragraphs and full novels in the language! R script files (often with the R filename extension) are the place to write scripts. You can re-run your scripts to transform, at will and automatically, your source data into information and knowledge.

This chapter begins with an overview of some of the key advantages (and disadvantages) of using R and continues with a guide to interacting with R. For data mining purposes the recommended interface is the simple to use Rattle (Chapter 2), although more advanced users will prefer the powerful Emacs editor, augmented with the ESS package. Both run under GNU/Linux, Mac/OSX, and MS/Windows. This is a personal preference and you may prefer some of the alternatives we discuss--this freedom of choice is yours.

Direct interaction with R has a steeper learning curve than using GUI based systems, but once into R, performing operations over the same or similar datasets becomes very easy using its programming language interface. For the R beginner, using a GUI like Rattle, where all underlying R commands are available for your perusal and direct pasting into R itself, may be a good first step.

Let's start with some of the advantages with using R:

Whilst the advantages might flow from the pen with a great deal of enthusiasm, it is useful to note some of the disadvantages or weaknesses of R, even if they are perhaps transitory!

The remaining sections of this chapter can generally be skipped on a reading through the book, particularly if you are using Rattle. They provide a basic reference guide to using R, and in particular some of its programming capabilities. While chapter [*] deals in detail with creating data in R, we introduce some of the basics here. The most basic needs include creating simple datasets, and being familiar with the basic data types and programming concepts, and how to get help.



Subsections
Copyright © 2004-2006 [email protected]
Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.