 DATA MINING
Desktop Survival Guide
by Graham Williams ### Elements

 ```> letters # a b c [...] z > letters # "j" > letters[10:15] # "j" "k" "l" "m" "n" "o" > letters[c(1, 2, 4, 8, 16)] # "a" "b" "d" "h" "p" > letters[-(10:26)] # "a" "b" "c" "d" "e" "f" "g" "h" "i" ```

An operator (or function) can be applied to a vector to return a vector. This is particularly useful for boolean operators, returning a vector of boolean values which can then be used to select specific elements of a vector:

 ```> letters > "j" # FALSE FALSE FALSE [...] TRUE > letters[letters > "j"] # "k" "l" "m" "n" [...] "y" "z" > letters[letters > "w" | letters < "e"] # "a" "b" "c" "d" "x" "y" "z" ```

Here's a useful trick to ensure we don't divide by zero, which would otherwise give an infinite answer (Inf):

 ```> x <- c(0.28, 0.55, 0, 2) > y <- c(0.53, 1.34, 1.2, 2.07) > sum(((x-y)^2/x))  Inf > sum(((x-y)^2/x)[x!=0]) # Exclude the zeros  1.360392 ```

We could also generate random subsets of our data.

 ```> subdataset <- dataset[sample(seq(1, nrow(dataset)), 1000),] ```

We can select elements meeting set inclusion conditions. Here we first select a subset of rows from a data frame having particular colours.

 ```> ds[ds\$colour %in% c("green", "blue"),] > ds[ds\$colour %in% names(which(table(ds\$colour) > 11)),] ```