DATA MINING
Desktop Survival Guide
by
Graham Williams
Desktop Survival
Project Home
List of Figures
List of Tables
Data Mining
Data Mining
Data Mining with Rattle
Introduction
Data
Transform
Explore
A Model Building Framework
Unsupervised Modelling
Two Class Models
Multi Class Models
Regression Models
Text Mining
Evaluation and Deployment
Moving into R
Troubleshooting
R for the Data Miner
R
Data
Graphics in R
Understanding Data
Preparing Data
Building Models
Evaluating Models
Algorithms
Apriori
Bagging
Bayes Classifier
Boosting
Cluster Analysis
Conditional Trees
Hierarchical Clustering
K-Means
K-Nearest Neighbours
Linear Models
Logistic Regression
Neural Networks
Support Vector Machines
Text Mining
Open Products
AlphaMiner
Borgelt Data Mining Suite
KNime
R
Rattle
Weka
Closed Products
C4.5
Clementine
Equbits Foresight
GhostMiner
InductionEngine
ODM
Enterprise Miner
Statistica Data Miner
TreeNet
Virtual Predict
Appendicies
Glossary
Bibliography
Index
R for the Data Miner
Subsections
R: An Introduction to the Language
Obtaining and Installing R
Installing on Debian GNU/Linux
Installing on MS/Windows
Install MS/Windows Version Under GNU/Linux
Interacting With
Basic Command Line
Emacs and ESS
Windows, Icons, Mouse, Pointer--WIMP
Evaluation
Help
Assignment
Libraries and Packages
Searching for Objects
Package Management
Information About a Package
Testing Package Availability
Basic Programming in
Folders and Files
Flow Control
Functions
Apply
Methods
Objects
System
Running System Commands
System Parameters
Misc
Internet
Memory Management
Memory Usage
Garbage Collection
Errors
Frivolous
Sudoku
Further Resources
Using R
Specific Purposes
Survey Analysis
Data
Data Manipulation in R
Data Types
Numbers
Strings
Building Strings
Splitting Strings
Substitution
Trim Whitespace
Evaluating Strings
Logical
Vectors
Arrays
Lists
Sets
Matricies
Data Frames
Accessing Columns
Time and Dates
Space
General Manipulation
Factors
Elements
Rows and Columns
Finding Index of Elements
Partitions
Head and Tail
Reverse a List
Sorting
Unique Values
Saving and Loading
R
Data and Objects
Formatted Output
Automatically Generate Filenames
Obtaining Data
Reading Data
Vector Data
R Datasets
The Iris Dataset
CSV Data
The Wine Dataset
The Cardiac Arrhythmia Dataset
The Adult Survey Dataset
Using SQLite
ODBC Data
Database Connection
Excel
Access
Clipboard Data
Map Data
Other Data Formats
Fixed Width Data
Global Positioning System
Documenting a Dataset
Common Data Problems
Graphics in R
Basic Plot
Controlling Axes
Arrow Axes
Legends and Points
Colour
Symbols
Multiple Plots
Other Graphic Elements
Maths in Labels
Making an Animation
Animated Mandelbrot
Adding a Logo to a Graphic
Graphics Devices Setup
Screen Devices
Multiple Devices
File Devices
Multiple Plots
Copy and Print Devices
Graphics Parameters
Plotting Region
Locating Points on a Plot
Scientific Notation and Plots
Understanding Data
Single Variable Overviews
Textual Summaries
Multiple Line Plots
Separate Line Plots
Pie Chart
Fan Plot
Stem and Leaf Plots
Histogram
Barplot
Trellis Histogram
Histogram Uneven Distribution
Density Plot
Basic Histogram
Basic Histogram with Density Curve
Practical Histogram
Multiple Variable Overviews
Pivot Tables
Scatterplot
Scatterplot with Marginal Histograms
Multi-Dimension Scatterplot
Correlation Plot
Colourful Correlations
Projection Pursuit
RADVIZ
Parallel Coordinates
Measuring Data Distributions
Textual Summaries
Boxplot
Multiple Boxplots
Boxplot by Class
Tuning a Boxplot
Boxplot From ggplot
Violin Plot
What Distribution
Labelling Outliers
Miscellaneous Plots
Line and Point Plots
Matrix Data
Multiple Plots
Aligned Plots
Probability Scale
Network Plot
Sunflower Plot
Stairs Plot
Graphing Means and Error Bars
Bar Charts With Segments
Bar Plot With Means
Multi-Line Title
Mathematics
Plots for Normality
Basic Bar Chart
Bar Chart Options
Multiple Dot Plots
Alternative Multiple Dot Plots
3D Plot
Box and Whisker Plot
Box and Whisker Plot: With Means
Clustered Box Plot
Perspective Plots
Star Plot
Residuals Plot
Dates and Times
Simple Time Series
Multiple Time Series
Plot Time Series
Plot Time Series with Axis Labels
Using gGobi
Map Displays
Further Resources
Preparing Data
Data Selection and Extraction
Training and Test Datasets
Data Cleaning
Review Data
Removing Duplicates
Selectively Changing Vector Values
Replace Indices By Names
Missing Values
Remove Levels from a Factor
Removing Outliers
Variable Manipulations
Remove Columns
Reorder Columns
Remove Non-Numeric Columns
Remove Variables with no Variance
Cleaning the Wine Dataset
Cleaning the Cardiac Dataset
Cleaning the Survey Dataset
Imputation
Data Linking
Simple Linking
Record Linkage
Data Transformation
Aggregation
Sum of Columns
Normalising Data
Binning
Interpolation
Outlier Detection
Variable Selection
Building Models
Unbalanced Classification
Building Models
Outlier Analysis
Temporal Analysis
Evaluation
Basics
Basic Measures
Cross Validation
Graphical Performance Measures
Lift
The ROC Curve
Other Examples
10 Fold Cross Validation
Area Under Curve
Calibration Curves
Measurement Issues
Overfitting
Imbalanced Decisions
Copyright © 2004-2006
[email protected]
Support further development through the
purchase of the PDF
version of the book.
Brought to you by
Togaware
.