DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
The Rattle interface is based on a set of tabs through which we progress. For any tab, once we have set up the required information, we need to click the Execute button to perform the actions. Take a moment to explore the interface a little. Notice the Help menu and find that the help layout mimics the tab layout.
We will work through the functionality of Rattle with the use of a simple dataset, the audit dataset, which is supplied as part of the Rattle package (it is also available for download as a CSV file from http://rattle.togaware.com/audit.csv). This is an artificial dataset consisting of 2,000 fictional clients who have been audited, perhaps for compliance with regard the amount of a tax refund that is being claimed. For each case an outcome is recorded (whether the taxpayer's claims had to be adjusted or not) and any amount of adjustment that resulted is also recorded.
The dataset is only 2,000 entities in order to ensure model building is relatively quick, for illustrative purposes. Typically, our data contains tens of thousands and more (often millions) of entities. The audit dataset contains 13 columns (or variables), with the first being a unique client identifier. Again, real data will often have one hundred or more variables.
We proceed through the typical steps of a data mining project, beginning with a data load and selection, then an exploration of the data, and finally, modelling and evaluation.
The data mining process steps through each tab, left to right, performing the corresponding actions. For any tab, the modus operandi is to configure the options available and to then click the Execute button (or F5) to perform the appropriate tasks. It is important to note that the tasks are not performed until the Execute button (or F5 or the Execute menu item under Tools) is clicked.
The Status Bar at the base of the window will indicate when the action is completed. Messages from R (e.g., error messages, although many R error messages are captured by Rattle and displayed in a popup) will appear in the R console from where Rattle was started.
The R Code that is executed underneath will appear in the Log tab. This allows for a review of the R commands that perform the corresponding data mining tasks. The R code snippets can be copied as text from the Log tab and pasted into the R Console from which Rattle is running, to be directly executed. This allows a user to deploy Rattle for basic tasks, yet allow the full power of R to be deployed as needed, perhaps through using more command options than exposed through the Rattle interface. This also allows the user the opportunity to save the whole session to file as a record of the actions taken, and possibly for running directly and automatically through R itself at a later time.
Copyright © 2004-2006 [email protected] Support further development through the purchase of the PDF version of the book.