DATA MINING
Desktop Survival Guide
by Graham Williams

Organisation

Part II constitutes a complete guide to using Rattle for data mining.

In Chapter 2 we introduce Rattle as a graphical user interface (GUI) developed for making any data mining project a lot simpler. This covers the installation of both R and Rattle, as well as basic interaction with Rattle.

Chapters to then detail the steps of the data mining process, corresponding to the straightforward interface presented through Rattle. We describe how to get data into Rattle, how to select variables, and how to perform sampling in Chapter . Chapter then reviews various approaches to exploring the data in order for us to gain some insights about the data we are looking at as well as understanding the distribution of the data and to assess the appropriateness of any modelling.

Chapters to cover modelling, including descriptive and predictive modelling, and text mining. The evaluation of the performance of the models and their deployment is covered in Chapter . Chapter provides an introduction to migrating from Rattle to the underlying R system. It does not attempt to cover all aspects of interacting with R but is sufficient for a competent programmer or software engineer to be able to extend and further fine tune the modelling performed in Rattle. Chapter covers troubleshooting within Rattle.

Part II delves much deeper into the use of R for data mining. In particular, R is introduced as a programming language for data mining. Chapter introduces the basic environment of R. Data and data types are covered in Chapter and R's extensive capabilities in producing stunning graphics is introduced in Chapter . We then pull together the capabilities of R to help us understand data in Chapter . We then move on to preparing our data for data mining in Chapter , building models in Chapter , and evaluating our models in Chapter .

Part reviews the algorithms employed in data mining. The encyclopedic type overview covers many tools and techniques deployed within data mining, ranging from decision tree induction and association rules, to multivariate adaptive regression splines and patient rule induction methods. We also cover standards for sharing data and models.

We continue the Desktop Guide with a snapshot of some current alternative open source and then commercial data mining products in Part , Open Source Products, and Part , Commercial Off The Shelf Products.

Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.