Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Elements



> letters		        # a b c [...] z
> letters[10]			# "j"
> letters[10:15]		# "j" "k" "l" "m" "n" "o"
> letters[c(1, 2, 4, 8, 16)]	# "a" "b" "d" "h" "p"
> letters[-(10:26)]		# "a" "b" "c" "d" "e" "f" "g" "h" "i"

An operator (or function) can be applied to a vector to return a vector. This is particularly useful for boolean operators, returning a vector of boolean values which can then be used to select specific elements of a vector:

> letters > "j"		        		# FALSE FALSE FALSE [...] TRUE
> letters[letters > "j"]	        	# "k" "l" "m" "n" [...] "y" "z"
> letters[letters > "w" | letters < "e"]	# "a" "b" "c" "d" "x" "y" "z"

Here's a useful trick to ensure we don't divide by zero, which would otherwise give an infinite answer (Inf):

> x <- c(0.28, 0.55, 0, 2)
> y <- c(0.53, 1.34, 1.2, 2.07)
> sum(((x-y)^2/x))                  
[1] Inf
> sum(((x-y)^2/x)[x!=0])            # Exclude the zeros
[1] 1.360392

We could also generate random subsets of our data.



> subdataset <- dataset[sample(seq(1, nrow(dataset)), 1000),]

We can select elements meeting set inclusion conditions. Here we first select a subset of rows from a data frame having particular colours.

> ds[ds$colour %in% c("green", "blue"),]
> ds[ds$colour %in% names(which(table(ds$colour) > 11)),]



Copyright © 2004-2006 [email protected]
Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.