GNU Octave - Basic Statistical Functions

GNU Octave Manual Version 3
by John W. Eaton, David Bateman, Søren Hauberg
Paperback (6"x9"), 568 pages
ISBN 095461206X
RRP £24.95 ($39.95)

Get a printed copy>>>

GNU Octave Manual Version 3

Buy the book here >>>
support free documentation

24.2 Basic Statistical Functions

Octave also supports various helpful statistical functions.

Function File: mahalanobis (x, y): Return the Mahalanobis' D-square distance between the multivariate samples x and y, which must have the same number of components (columns), but may have a different number of observations (rows).

Function File: center (x)
Function File: center (x, dim): If x is a vector, subtract its mean. If x is a matrix, do the above for each column. If the optional argument dim is given, perform the above operation along this dimension

Function File: studentize (x, dim)

If x is a vector, subtract its mean and divide by its standard deviation.

If x is a matrix, do the above along the first non-singleton dimension. If the optional argument dim is given then operate along this dimension.

Function File: c = nchoosek (n, k)

Compute the binomial coefficient or all combinations of n. If n is a scalar then, calculate the binomial coefficient of n and k, defined as

 /   \
 | n |    n (n-1) (n-2) ... (n-k+1)       n!
 |   |  = ------------------------- =  ---------
 | k |               k!                k! (n-k)!
 \   /

If n is a vector generate all combinations of the elements of n, taken k at a time, one row per combination. The resulting c has size [nchoosek (length (n), k), k].

See also bincoeff

Function File: perms (v)

Generate all permutations of v, one row per permutation. The result has size factorial (n) * n, where n is the length of v.

As an example, perms([1, 2, 3]) returns the matrix

Function File: values (x)

Return the different values in a column vector, arranged in ascending order.

As an example, values([1, 2, 3, 1]) returns the vector [1, 2, 3].

Function File: [t, l_x] = table (x)

Function File: [t, l_x, l_y] = table (x, y)

Create a contingency table t from data vectors. The l vectors are the corresponding levels.

Currently, only 1- and 2-dimensional tables are supported.

Function File: spearman (x, y)

Compute Spearman's rank correlation coefficient rho for each of the variables specified by the input arguments.

For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.

spearman (x) is equivalent to spearman (x, x).

For two data vectors x and y, Spearman's rho is the correlation of the ranks of x and y.

If x and y are drawn from independent distributions, rho has zero mean and variance 1 / (n - 1), and is asymptotically normally distributed.

Function File: run_count (x, n): Count the upward runs along the first non-singleton dimension of x of length 1, 2, ..., n-1 and greater than or equal to n. If the optional argument dim is given operate along this dimension

Function File: ranks (x, dim)

If x is a vector, return the (column) vector of ranks of x adjusted for ties.

If x is a matrix, do the above for along the first non-singleton dimension. If the optional argument dim is given, operate along this dimension.

Function File: range (x)

Function File: range (x, dim)

If x is a vector, return the range, i.e., the difference between the maximum and the minimum, of the input data.

If x is a matrix, do the above for each column of x.

If the optional argument dim is supplied, work along dimension dim.

Function File: probit (p): For each component of p, return the probit (the quantile of the standard normal distribution) of p.

Function File: logit (p)

For each component of p, return the logit of p defined as

logit(p) = log (p / (1-p))

Function File: cloglog (x)

Return the complementary log-log function of x, defined as

cloglog(x) = - log (- log (x))

Function File: kendall (x, y)

Compute Kendall's tau for each of the variables specified by the input arguments.

For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.

kendall (x) is equivalent to kendall (x, x).

For two data vectors x, y of common length n, Kendall's tau is the correlation of the signs of all rank differences of x and y; i.e., if both x and y have distinct entries, then

         1    
tau = -------   SUM sign (q(i) - q(j)) * sign (r(i) - r(j))
      n (n-1)   i,j

in which the q(i) and r(i)

are the ranks of x and y, respectively.

If x and y are drawn from independent distributions, Kendall's tau is asymptotically normal with mean 0 and variance (2 * (2n+5)) / (9 * n * (n-1)).

Function File: iqr (x, dim)

If x is a vector, return the interquartile range, i.e., the difference between the upper and lower quartile, of the input data.

If x is a matrix, do the above for first non-singleton dimension of x. If the option dim argument is given, then operate along this dimension.

Function File: cut (x, breaks)

Create categorical data out of numerical or continuous data by cutting into intervals.

If breaks is a scalar, the data is cut into that many equal-width intervals. If breaks is a vector of break points, the category has length (breaks) - 1 groups.

The returned value is a vector of the same size as x telling which group each point in x belongs to. Groups are labelled from 1 to the number of groups; points outside the range of breaks are labelled by NaN.

GNU Octave Manual Version 3

See the print edition